Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness

Applied Intelligence(2016)

引用 28|浏览24
暂无评分
摘要
Computing the semantic similarity/relatedness between terms is an important research area for several disciplines, including artificial intelligence, cognitive science, linguistics, psychology, biomedicine and information retrieval. These measures exploit knowledge bases to express the semantics of concepts. Some approaches, such as the information theoretical approaches, rely on knowledge structure, while others, such as the gloss-based approaches, use knowledge content. Firstly, based on structure, we propose a new intrinsic Information Content (IC) computing method which is based on the quantification of the subgraph formed by the ancestors of the target concept. Taxonomic measures including the IC-based ones consume the topological parameters that must be extracted from taxonomies considered as Directed Acyclic Graphs (DAGs). Accordingly, we propose a routine of graph algorithms that are able to provide some basic parameters, such as depth, ancestors, descendents, Lowest Common Subsumer (LCS). The IC-computing method is assessed using several knowledge structures which are: the noun and verb WordNet “ is a ” taxonomies, Wikipedia Category Graph (WCG), and MeSH taxonomy. We also propose an aggregation schema that exploits the WordNet “ is a ” taxonomy and WCG in a complementary way through the IC-based measures to improve coverage capacity. Secondly, taking content into consideration, we propose a gloss-based semantic similarity measure that operates based on the noun weighting mechanism using our IC-computing method, as well as on the WordNet, Wiktionary and Wikipedia resources. Further evaluation is performed on various items, including nouns, verbs, multiword expressions and biomedical datasets, using well-recognized benchmarks. The results indicate an improvement in terms of similarity and relatedness assessment accuracy.
更多
查看译文
关键词
Information content,Gloss,WordNet,Wikipedia,Wiktionary,MeSH,DAG algorithms,Semantic similarity,Semantic relatedness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要