Combine-and-conquer: improving the diversity in similarity search through influence sampling

Lucio F. D. Santos,Willian D. Oliveira,Luiz Olmes Carvalho,Mônica Ribeiro Porto Ferreira,Agma J. M. Traina,Caetano Traina Jr.

SAC 2015: Symposium on Applied Computing Salamanca Spain April, 2015（2015）

引用 3|浏览38

暂无评分

摘要

Result diversification methods are intended to retrieve elements similar to a given object whereas also enforcing a certain degree of diversity among them, aimed at improving the answer relevance. Most of the methods are based on optimization, but bearing NP-hard solutions. Diversity is injected into an otherwise all-too-similar result set in two phases: in the first, the search space is reduced to speed up finding the optimal solution, whereas in the second a trade-off between diversity and similarity over the reduced space is obtained. It is assumed that the first phase is achieved by applying a traditional nearest neighbor algorithm, but no previous investigation evaluated the impact of the first over the second phase. In this paper, we devised alternative techniques to execute the first phase and evaluated how obtaining a better quality set of elements in the first phase can improve the diversity. Besides the traditional nearest neighbor-based pre-selection, we also considered naive random selection, cluster-based and influence-based ones. Thereafter, extensive experiments evaluated a number of state-of-the-art diversity algorithms employed in the second phase, regarding both processing time and answer quality. The obtained results have shown that although the much more elaborated (and much more time consuming) methods indeed provide best answers, other alternatives are able to provide a better commitment regarding quality and performance. Moreover, the pre-selection techniques can reduce the total running time by up to two orders of magnitude.

查看译文

关键词

Search result diversification, similarity search, sampling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要