Metric Techniques for High-Dimensional Indexing

msra

引用 23|浏览13
暂无评分
摘要
Abstract Despite the proposal of numerous tree-based structures for high-dimensional similarity searches, techniques based on a sequential scan, such as the VA-File, have been shown to be quite effective. In this thesis we present three new access structures which use sequential access patterns to ef£ciently answer similarity queries for high-dimensional vector and metric data. Two of these access structures are designed to answer range queries, while the third access method is intended for nearest neighbor queries. The three access methods organize preprocessed data and reorder the original data sequentially on disk. At query time, portions of the data are read sequentially to prevent expensive random,disk I/Os that prevent many access methods for high-dimensional data and metric data from being ef£cient. Experimental results show the £rst two proposed access structures can process range queries up to 24 times faster than the VA-File and up to 69 times faster than a sequential scan of the data set. The third proposed method is designed for nearest neighbor queries and is up to 15 times faster than the VA-File method and more than 40 times faster than a sequential scan of the data set. Unlike the VA-File, the access structures presented in this thesis work in general metric spaces, in addition to vector space (under a metric distance), scale well with increasing the radius of range queries and the number,of nearest neighbors retrieved for nearest neighbor queries, and can be easily implemented.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要