High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers

Beijing(2007)

引用 9|浏览11
暂无评分
摘要
Citation indices are increasingly being used not only as navigational tools for re- searchers, but also as the basis for mea- surement of academic performance and re- search impact. This means that the reliabil- ity of tools used to extract citations and con- struct such indices is becoming more crit- ical; however, existing approaches to cita- tion extraction still fall short of the high ac- curacy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extrac- tion of citations from academic papers, de- signed for applicability across a broad range of disciplines and document styles. We in- tegrate citation extraction, reference pars- ing, and author named entity recognition to significantly improve performance in cita- tion extraction, and demonstrate this per- formance on a cross-disciplinary heteroge- neous corpus. Applying our algorithm to previously unseen documents, we demon- strate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.
更多
查看译文
关键词
citation analysis,information retrieval,text analysis,academic papers,document styles,heterogeneous corpus,high accuracy citation extraction,named entity recognition,reference parsing,textual citation indices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要