Automatic taxonomy extraction in different languages using wikipedia and minimal language-specific information

CICLing (1)(2012)

引用 11|浏览0
暂无评分
摘要
Knowledge bases extracted from Wikipedia are particularly useful for various NLP and Semantic Web applications due to their co- verage, actuality and multilingualism. This has led to many approaches for automatic knowledge base extraction from Wikipedia. Most of these approaches rely on the English Wikipedia as it is the largest Wikipedia version. However, each Wikipedia version contains socio-cultural knowledge, i.e. knowledge with relevance for a specific culture or language. In this work, we describe a method for extracting a large set of hyponymy relations from the Wikipedia category system that can be used to acquire taxonomies in multiple languages. More specifically, we describe a set of 20 features that can be used for for Hyponymy Detection without using additional language-specific corpora. Finally, we evaluate our approach on Wikipedia in five different languages and compare the results with the WordNet taxonomy and a multilingual approach based on interwiki links of the Wikipedia.
更多
查看译文
关键词
minimal language-specific information,automatic knowledge base extraction,largest wikipedia version,hyponymy detection,large set,wikipedia version,knowledge base,wikipedia category system,automatic taxonomy extraction,multilingual approach,different language,socio-cultural knowledge,english wikipedia,nlp
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要