Fishing in the stream: Similarity search over endless data

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2017)

引用 4|浏览65
暂无评分
摘要
Similarity search is the task of retrieving data items that are similar to a given query. In this paper, we introduce the time-sensitive notion of similarity search over endless data-streams (SSDS), which takes into account data quality and temporal characteristics in addition to similarity. SSDS is challenging as it needs to process unbounded data, while computation resources are bounded. We propose Stream-LSH, a randomized SSDS algorithm that bounds the index size by retaining items according to their freshness, quality, and dynamic popularity attributes. We show that Stream-LSH increases recall when searching for similar items compared to alternative approaches using the same space capacity.
更多
查看译文
关键词
Similarity search, Stream search, Retention policy, Locality sensitive hashing, Dynamic popularity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要