谷歌浏览器插件
订阅小程序
在清言上使用

Using a Gradient Boosted Model for Case Ascertainment from Free-Text Veterinary Records.

Preventive veterinary medicine(2023)

引用 2|浏览19
暂无评分
摘要
Case ascertainment for prevalence and incidence studies from veterinary clinical data poses a major challenge because medical notes are not consistently structured or complete. Using natural language processing (NLP) and machine learning, this study aimed to obtain accurate case recognition for feline upper respiratory tract infections (primarily caused by viruses such as feline herpes virus (FHV-1) and feline calici virus (FCV), and bacteria such as Chlamydophila felis, Mycoplasma felis and Bordetella bronchiseptica using retrospective electronic veterinary records from the Royal Society for Prevention of Cruelty to Animals, Queensland (RSPCA Qld). Data cleaning and NLP on eight years of free-text veterinary records from RSPCA Queensland was carried out to derive text-based predictors. The NLP steps included sorting records by length of stay, vectorising, tokenising and spell checking against a bespoke veterinary database. A gradient boosted model (GBM) was trained to predict the probability of each animal having a diagnosis of upper respiratory infection. A manually annotated dataset was used for training the algorithm to learn dominant patterns between predictors (frequencies of n-grams) and responses (manual binary case classification). The GBM's performance was tested against an out of sample validation dataset, and model agnostics were used to interrogate the model's learning process. The GBM used patient-level frequencies of 1250 unique n-grams as predictor variables and was able to predict the probability of cases in the validation dataset with an accuracy of 0.95 (95% CI 0.92, 0.97) and F1 score of 0.96. Predictors that exerted the highest influence on the model included frequencies of "doxycycline", "flu", "sneezing", "doxybrom" and "ocular". The trained GBM was deployed on the full dataset spanning eight years, comprising 60,258 clinical entries. The prevalence in the full dataset was predicted to be 23.59%, which is in line with domain expertise from practicing veterinarians at the shelter. Case ascertainment is a crucial step for further epidemiological study of cat flu. Ultimately, this tool can be extended to other clinical procedures, conditions, and diseases such as intensive care treatment due to snake bites and tick paralysis, physical injuries such as orthopaedic fractures or chest injuries and labour-intensive infectious diseases like parvovirus, canine cough, and ringworm, all of which require prolonged quarantine and care.
更多
查看译文
关键词
Machine learning,Gradient boosted model,Case ascertainment,Shelter,Herpes virus,Chlamydophila felis,Mycoplasma felis,Bordetella bronchiseptica,Calici virus,Feline
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要