A High Performance Text Vector Similarity Search Method Based on Overlapping Degree

Peng Zhao, Fan Yang,Zhibin Zhang,Jiafeng Guo,Xueqi Cheng

2019 International Conference on Data Mining Workshops (ICDMW)（2019）

引用 0|浏览29

暂无评分

摘要

Similarity search of massive high-dimensional data is an important issue in the field of text data computation and search, which seriously affects the efficiency and convenience of text search and use. This paper proposes a high-performance text vector similarity search method, and proposes overlapping degree to describe the statistical characteristics of query vectors recurring in inverted index. Based on overlapping degree, candidate data sets are generated and sorted, distance calculation is carried out in candidate data set sequence, unnecessary calculation is reduced, and computation scale is reduced. Similar vectors of query vectors can be quickly obtained with very short index construction time and small memory overhead. The experimental results show that compared with the representative four methods, the index construction time of our method is shortened to 22-223 times, the index size memory overhead is reduced by 2-18 times, and the query speed advantage can be obtained with high recall rate, which greatly improves the performance and efficiency of text data similarity search.

查看译文

关键词

text retrieval, similarity search, overlapping degree, index constructing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要