Ontology-Based Supervised Concept Learning for the Biogeochemical Literature
2018 IEEE International Conference on Information Reuse and Integration (IRI)(2018)
摘要
Academic literature search is a vital step of every research project, especially in the face of the increasingly rapid growth of scientific knowledge. Semantic academic literature search is an approach to scientific article retrieval and ranking using concepts in an attempt to address well-known deficiencies of keyword-based search. The difficulty of semantic search, however, is that it requires significant knowledge engineering, often in the form of conceptual ontologies tailored to a particular scientific domain. It also requires non-trivial tuning, in the form of domain-specific term and concepts weights. As part of an ongoing project seeking to build a domain-specific semantic search system, we present an ontology-based supervised concept learning approach for the biogeochemical scientific literature. We first discuss the creation of a dataset of scientific articles in the biogeochemical domain annotated using the Environment Ontology (ENVO). Next we present a supervised machine learning classifier-a random decision forest-that uses a distinctive set of features to learn ENVO concepts and then label and index scientific articles at the sentence level. Finally, we evaluate our approach against two baseline methods, keyword-based and bag-of-words, achieving an overall performance of 0.76 F_1 measure, an improvement of approximately 50%.
更多查看译文
关键词
Natural Language processing,Semantic Search,Academic Search,Ontologies,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络