Dr. Inventor Framework: Extracting Structured Information From Scientific Publications

DISCOVERY SCIENCE, DS 2015(2015)

引用 33|浏览21
暂无评分
摘要
Even if research communities and publishing houses are putting increasing efforts in delivering scientific articles as structured texts, nowadays a considerable part of on-line scientific literature is still available in layout-oriented data formats, like PDF, lacking any explicit structural or semantic information. As a consequence the bootstrap of textual analysis of scientific papers is often a time-consuming activity. We present the first version of the Dr. Inventor Framework, a publicly available collection of scientific text mining components useful to prevent or at least mitigate this problem. Thanks to the integration and the customization of several text mining tools and on-line services, the Dr. Inventor Framework is able to analyze scientific publications both in plain text and PDF format, making explicit and easily accessible core aspects of their structure and semantics. The facilities implemented by the Framework include the extraction of structured textual contents, the discursive characterization of sentences, the identifications of the structural elements of both papers header and bibliographic entries and the generation of graph based representations of text excerpts. The Framework is distributed as a Java library. We describe in detail the scientific mining facilities included in the Framework and present two use cases where the Framework is respectively exploited to boost scientific creativity and to generate RDF graphs from scientific publications.
更多
查看译文
关键词
Scientific text mining, Scientific information extraction, Software framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要