Metric Similarity Joins Using Mapreduce (Extended Abstract)

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)(2018)

引用 0|浏览39
暂无评分
摘要
Given two object sets Q and O, a metric similarity join finds similar object pairs according to a certain criterion. This operator has a wide range of applications in data cleaning, data mining, etc. In this paper, we employ a popular distributed framework, namely, MapReduce, to support scalable metric similarity joins. To ensure load balancing, we present two sampling based partition methods, i.e., clustering based partition method and KD-tree based partition method. To avoid unnecessary object pair evaluation, we propose a framework that maps the two involved object sets in order, where plane sweeping and pivot based filtering techniques are utilized for pruning. Extensive experiments confirm that our solution outperforms significantly existing state-of-the-art competitors.
更多
查看译文
关键词
Similarity Joins,Metric Space,MapReduce,Algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要