HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)(2022)

引用 4|浏览6
暂无评分
摘要
The emerging deep learning techniques prefer mapping complex data objects (e.g., images, documents) to compact binary vectors (i.e., hash codes) for efficient similarity search. In this paper, we study the problem of indexing large-scale binary databases to support fast Hamming distance-based similarity queries. Existing Hamming space indices usually divide long binary vectors into short disjoint pieces and apply the Pigeonhole Principle to prune unnecessary candidates. In our work, we relax the disjoint partition constraint by allowing dimension redundancy, which yields a tighter pruning bound named Augmented Pigeonhole Principle (APP). Intuitively, APP enables more optimization opportunities by capturing the correlation between database and query workloads. Based on APP, we propose HAP, an efficient Hamming space index framework to support both Hamming range queries and k-NN queries. To guide index construction and run-time query optimization, we introduce a novel DL-base query cardinality estimator named SimCardNet. To further reduce the index space cost, we propose a learned index compression scheme by combining the piece-wise linear approximation (PLA) and Elias-Fano encoding. In addition, we also study the problem of optimizing the execution time of a batch of queries using our index framework. The experimental results on large-scale binary databases reveal that our indexing scheme outperforms the state-of-the-art baselines in terms of both space and time efficiency.
更多
查看译文
关键词
Hamming Space Index, Pigeonhole Principle, Similarity Search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要