The Merkurion approach for similarity searching optimization in Database Management Systems.

Data & Knowledge Engineering(2018)

引用 4|浏览24
暂无评分
摘要
Modern Database Management Systems (DBMSs) retrieve songs that resemble those in a music dataset, identify plagiarism in a set of documents, or provide past cases to physicians by taking into account the characteristics of a query exam. All such tasks require the comparison of data by similarity, which can be expressed in terms of distance-based queries in metric spaces. Traditional query processing relies mostly on histograms for describing the data distribution space and choosing a data retrieval path that quickly leads to the answer, discarding comparisons of most unwanted data. However, DBMSs still lack adequate support for selectivity estimation of query operators for data types embedded in metric spaces. This article addresses a novel strategy that extends the query optimizer of a DBMS, so that it can also perform both logical and physical query plan optimizations in searches that include similarity predicates. The proposal, named Merkurion, updates the concept of Data Distribution Space and captures data distributions according to the distances between the elements within a dataset. Moreover, it employs concise representations of such distributions, called synopses, for the definition of rules that enable similarity searching optimization. An extensive evaluation of Merkurion in real-world datasets has proven its effectiveness and broad applicability to many data domains.
更多
查看译文
关键词
Similarity searching,Query optimization,Selectivity estimation,Design and implementation techniques
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要