Framework for entity extraction with verification: application to inference of data set usage in research publications

ELECTRONIC LIBRARY(2022)

引用 0|浏览5
暂无评分
摘要
Purpose The purpose of this paper is to propose an extensible framework for extracting data set usage from research articles. Design/methodology/approach The framework uses a training set of manually labeled examples to identify word features surrounding data set usage references. Using the word features and general entity identifiers, candidate data sets are extracted and scored separately at the sentence and document levels. Finally, the extracted data set references can be verified by the authors using a web-based verification module. Findings This paper successfully addresses a significant gap in entity extraction literature by focusing on data set extraction. In the process, this paper: identified an entity-extraction scenario with specific characteristics that enable a multiphase approach, including a feasible author-verification step; defined the search space for word feature identification; defined scoring functions for sentences and documents; and designed a simple web-based author verification step. The framework is successfully tested on 178 articles authored by researchers from a large research organization. Originality/value Whereas previous approaches focused on completely automated large-scale entity recognition from text snippets, the proposed framework is designed for a longer, high-quality text, such as a research publication. The framework includes a verification module that enables the request validation of the discovered entities by the authors of the research publications. This module shares some similarities with general crowdsourcing approaches, but the target scenario increases the likelihood of meaningful author participation.
更多
查看译文
关键词
Information extraction, Named entity recognition, Text mining, Data verification, Data set extraction, Research publications, Information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要