Epidemiologic Information Discovery from Open-Access COVID-19 Case Reports Via Pretrained Language Model

SSRN Electronic Journal(2022)

引用 1|浏览13
暂无评分
摘要
Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with an unstructured format. Here, we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling a language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applied to the COVID-19 case reports collected from mainland China, our framework outperforms all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To disseminate our algorithm, we provide an open-access online platform that is able to estimate key epidemiological statistics in real time, with much less effort for data curation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要