Large-Scale Application of Named Entity Recognition to Biomedicine and Epidemiology

medRxiv (Cold Spring Harbor Laboratory)(2022)

引用 1|浏览3
暂无评分
摘要
Background Despite significant advancements in biomedical named entity recognition methods, the clinical application of these systems continues to face many challenges: (1) most of the methods are trained on a limited set of clinical entities; (2) these methods are heavily reliant on a large amount of data for both pretraining and prediction, making their use in production impractical; (3) they do not consider non-clinical entities, which are also related to patient’s health, such as social, economic or demographic factors. Methods In this paper, we develop Bio-Epidemiology-NER ( https://pypi.org/project/Bio-Epidemiology-NER/ ) an open-source Python package for detecting biomedical named entities from the text. This approach is based on Transformer-based approach and trained on a dataset that is annotated with many named entities (medical, clinical, biomedical and epidemiological). This approach improves on previous efforts in three ways: (1) it recognizes many clinical entity types, such as medical risk factors, vital signs, drugs, and biological functions; (2) it is easily configurable, reusable and can scale up for training and inference; (3) it also considers non-clinical factors (age and gender, race and social history and so) that influence health outcomes. At a high level, it consists of the phases: preprocessing, data parsing, named entity recognition and named entities enhancement. Results Experimental results show that our pipeline outperforms other methods on three benchmark datasets with macro-and micro average F1 scores around 90 percent and above. Conclusion This package is made publicly available for use by researchers, doctors, clinicians and anyone to extract biomedical named entities from unstructured biomedical texts. Author Summary This paper introduces and presents a python package https://pypi.org/project/Bio-Epidemiology-NER/ that can extract named entities from the biomedical texts. Different from previous works, this package extracts not only clinical entities, such as disease, signs, symptoms but also demographics of the patients from the texts. This package can be used with least code requirements and can be used by epidemiologists, doctors, practitioners or others in the field to see the named entities from texts. The knowledge gained from the named entities help the end users to see the statistics or spread of infectious disease in least time and while parsing a large amount of free texts.
更多
查看译文
关键词
named entity recognition,biomedicine,epidemiology,large-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要