A Probabilistic Model For Linking Named Entities In Web Text With Heterogeneous Information Networks

SIGMOD/PODS'14: International Conference on Management of Data Snowbird Utah USA June, 2014(2014)

引用 79|浏览118
暂无评分
摘要
Heterogeneous information networks that consist of multi-type, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entitieS in Web text with a Heterogeneous Information NEtwork to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using meta-path constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.
更多
查看译文
关键词
Entity linking,Heterogeneous information networks,Domain-specific entity linking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要