Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking.

Extending Database Technology(2016)

引用 35|浏览101
暂无评分
摘要
Entity Resolution constitutes a quadratic task that typically scales to large entity collections through blocking. The resulting blocks can be restructured by Meta-blocking in order to significantly increase precision at a limited cost in recall. Yet, its processing can be time-consuming, while its precision remains poor for configurations with high recall. In this work, we propose new meta-blocking methods that improve precision by up to an order of magnitude at a negligible cost to recall. We also introduce two efficiency techniques that, when combined, reduce the overhead time of Metablocking by more than an order of magnitude. We evaluate our approaches through an extensive experimental study over 6 realworld, heterogeneous datasets. The outcomes indicate that our new algorithms outperform all meta-blocking techniques as well as the state-of-the-art methods for block processing in all respects.
更多
查看译文
关键词
entity resolution,heterogeneous data,meta-blocking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要