RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature

crossref(2024)

引用 0|浏览2
暂无评分
摘要
Motivation: In the field of biomedical text mining, the ability to extract relations from literature is crucial for advancing both theoretical research and practical applications. There is a notable shortage of corpora designed to enhance the extraction of multiple types of relations, particularly focusing on proteins and protein-containing entities such as complexes and families, as well as chemicals. Results: In this work, we present RegulaTome, a corpus that overcomes the limitations of several existing biomedical relation extraction (RE) corpora, many of which concentrate on single-type relations at the sentence level. RegulaTome stands out by offering 16,962 relations annotated in over 2,500 documents, making it the most extensive dataset of its kind to date. This corpus is specifically designed to cover a broader spectrum of over 40 relation types beyond those traditionally explored, setting a new benchmark in the complexity and depth of biomedical RE tasks. Our corpus both broadens the scope of detected relations and allows for achieving noteworthy accuracy in RE. A transformer-based model trained on this corpus has demonstrated a promising F1-score (66.6%) for a task of this complexity, underscoring the effectiveness of our approach in accurately identifying and categorizing a wide array of biological relations. This achievement highlights RegulaTome's potential to significantly contribute to the development of more sophisticated, efficient, and accurate RE systems to tackle biomedical tasks. Finally, a run of the trained relation extraction system on all PubMed abstracts and PMC Open Access full-text documents resulted in over 18 million relations, extracted from the entire biomedical literature. Availability: The corpus and all introduced resources are openly accessible via Zenodo (https://zenodo.org/doi/10.5281/zenodo.10808330) and GitHub (https://github.com/farmeh/RegulaTome_extraction).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要