Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks.

CIKM(2017)

引用 25|浏览114
暂无评分
摘要
Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-of-the-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要