BioAct: Biomedical Knowledge Base Construction using Active Learning

biorxiv(2022)

引用 0|浏览10
暂无评分
摘要
Creating and curating knowledge resources has been a paramount activity in the biomedical domain. In recent years, automated methods for knowledge base construction have flourished and have enabled large scale construction and curation of such resources. In the biological domain, techniques such as next generation sequencing produce new data at exponential rate, making mere manual curation of knowledge resources simply unfeasible. The major technology to automate knowledge base construction is Information Extraction (IE) \---| specifically tasks such as Named Entity Recognition or Relation Extraction. The major hurdle for IE methods is the availability of labelled data for training, which can be prohibitively expensive and challenging to obtain due to the need of domain experts. Active learning aims at minimizing the cost of manual labelling by only requiring it for smaller and more useful portions of the data. With this motivation, we devised a method to quickly construct highly curated datasets to enable biomedical knowledge base construction. The method, named BioAct , is based on a partnership between automatic annotation methods (leveraging SciBERT with other machine learning methods) and subject matter experts and uses active learning to create training datasets in the biological domain. The main contribution of this work is twofold; in addition to the BioAct method itself, we publicly release an annotated dataset on antimicrobial resistance, produced by a team of subject matter experts using BioAct . Additionally, we simulate a knowledge base construction task using the MegaRes and CARD knowledge bases to provide insight and lessons learned about the usefulness of the annotated dataset for this task. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
biomedical knowledge base construction,active learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要