DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic Bucketing

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING(2024)

引用 0|浏览0
暂无评分
摘要
Locality-sensitive hashing (LSH) is a promising family of methods for the high-dimensional approximate nearest neighbor (ANN) search problem due to its sub-linear query time and strong theoretical guarantee. Existing LSH methods either suffer from large index sizes and hash boundary problems, or incur a linear cost for high-quality candidate identification. This dilemma is addressed in a novel method called DB-LSH proposed in this paper. It organizes the projected spaces with multi-dimensional indexes instead of fixed-width hash buckets, which significantly reduces space costs. High-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based window queries. A novel incremental search strategy called DBI-LSH is also developed to further boost the query performance, which incrementally accesses the next best point for higher accuracy and efficiency. Considering the intermediate query information of each query, DBA-LSH is designed to adaptively tune termination conditions without scarifying the success probability. Our theoretical analysis proves that DB-LSH has a smaller query cost than the existing work while DBA-LSH and DBI-LSH have lower expected query costs than DB-LSH. An extensive range of experiments on real-world data show the superiority of our approaches over the state-of-the-art methods in both efficiency and accuracy.
更多
查看译文
关键词
Costs,Indexes,Search problems,Time complexity,Nearest neighbor methods,Hash functions,Extraterrestrial measurements,Approximate nearest neighbor search,high-dimensional spaces,locality sensitive hashing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要