FuseM: Query-Centric Data Fusion on Structured Web Markup

2017 IEEE 33rd International Conference on Data Engineering (ICDE)(2017)

引用 8|浏览29
暂无评分
摘要
Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, RDF statements extracted from markup are fundamentally different from traditional RDF graphs: entity descriptions are flat, facts are highly redundant, and despite very frequent co-references explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state-of-the-art methods and require preliminary data fusion. Given the scale and dynamics of Web markup, the applicability of general data fusion approaches is limited. We present a novel query-centric data fusion approach which overcomes such issues through a combination of entity retrieval and fusion techniques geared towards the specific challenges associated with embedded markup. To ensure precise and diverse entity descriptions, we follow a supervised learning approach and train a classifier for data fusion of a pool of candidate facts relevant to a given query and obtained through a preliminary entity retrieval step. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant improvement over existing baselines. In addition, an investigation into the coverage and complementarity of facts from the constructed entity descriptions compared to DBpedia, shows potential for aiding tasks such as knowledge base population.
更多
查看译文
关键词
FuseM,query-centric data fusion,structured Web markup,embedded markup,Microdata,microformats,World Wide Web,RDF statements,RDF graphs,entity retrieval,diverse entity descriptions,supervised learning,Web Data Commons dataset,DBpedia
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要