A Flexible Text Mining System for Entity and Relation Extraction in PubMed.

CIKM(2015)

引用 2|浏览18
暂无评分
摘要
ABSTRACTDue to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Therefore, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Extending Stanford CoreNLP, we developed the system with multiple types of entities and relations. We demonstrate the performance by evaluating on various corpora such as CRAFT, GENETAG, AnEM Corpus, NCBI Disease Corpus, DDI Corpus, Metabolite and Enzyme Corpus for NER and BioInfer, AIMed, GAD, CoMAGC, and PolySearch for RE and achieve with average F-measures of 85% for entity extraction and 82% for relation extraction. As advantages of this system, one is a configurability in various combinations of text-processing components that can be plugged in for different tasks. The other is an extensible framework for extraction; extensible rule engine for relation extraction (Plug-and-play approach). As shown in figure 1, the system contains two major pipelines for public knowledge discovery. The first pipeline extracts target entities based on dictionaries by extending the Stanford CoreNLP. The second pipeline applies dependency tree-based rules to sentences with two or more entities to extract relationships among those entities.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要