Constructing and Mining Heterogeneous Information Networks from Massive Text

Jingbo Shang,Jiaming Shen,Liyuan Liu,Jiawei Han

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Anchorage AK USA August, 2019（2019）

引用 4|浏览390

暂无评分

摘要

Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.

查看译文

关键词

entity recognition, massive text corpora, network mining and applications, phrase mining, taxonomy construction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要