A High-Performance Index for Real-Time Matrix Retrieval

IEEE Transactions on Knowledge and Data Engineering(2022)

引用 1|浏览52
暂无评分
摘要
A fundamental technique in machine learning called “embedding” has made significant impact on data representation. Some examples of embedding include word embedding, image embedding and audio embedding. With the embedding techniques, many real-world objects can be represented using matrices. For example, a document can be represented by a matrix, where each row of the matrix represents a word. On the other hand, we have witnessed that many applications continuously generate new data represented by matrices and require real-time query answering on the data. These continuously generated matrices need to be well managed for efficient retrieval. In this paper, we propose a high-performance index for real-time matrix retrieval. Besides fast query response, the index also supports real-time insertion by exploiting the log-structured merge-tree (LSM-tree). Since the index is built for matrices, it consumes much more memory and requires much more time to search than the traditional index for information retrieval. To tackle the challenges, we propose an index with precise and fuzzy inverted lists, and design a series of novel techniques to improve the memory consumption and the search efficiency of the index. The proposed techniques include vector signature, vector residual sorting, hashing based lookup, and dictionary initialization to guarantee the index quality. Comprehensive experimental results show that our proposed index can support real-time search on matrices, and is more time and memory efficient than the state-of-the-art method.
更多
查看译文
关键词
Indexing,search,matrices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要