A New Concept Of Sets To Handle Similarity In Databases: The Simsets

Ives R. V. Pola,Robson L. F. Cordeiro,Caetano Traina,Agma J. M. Traina

SISAP 2013: Proceedings of the 6th International Conference on Similarity Search and Applications - Volume 8199（2013）

引用 6|浏览21

暂无评分

摘要

Traditional DBMS are heavily dependent on the concept that a set never includes the same element twice. On the other hand, modern applications require dealing with complex data, such as images, videos and genetic sequences, in which exact match of two elements seldom occurs and, generally, is meaningless. Thus, it makes sense that sets of complex data should not include two elements that are "too similar". How to create a concept equivalent to "sets" for complex data? And how to design novel algorithms that allow it to be naturally embedded in existing DBMS? These are the issues that we tackle in this paper, through the concept of "similarity sets", or SimSets for short. Several scenarios may benefit from our SimSets. A typical example appears in sensor networks, in which SimSets can identify sensors recurrently reporting similar measurements, aimed at turning some of them off for energy saving. Specifically, our main contributions are: (i) highlighting the central properties of SimSets; (ii) proposing the basic algorithms required to create them from metric datasets, which were carefully designed to be naturally embedded into existing DBMS, and; (iii) evaluating their use on real world applications to show that our SimSets can improve the data storage and retrieval, besides the analysis processes. We report experiments on real data from networks of sensors existing within meteorological stations, providing a better conceptual underpinning for similarity search operations.

查看译文

关键词

Sensor Network, Real World Application, Complex Data, Cosine Similarity, Similarity Threshold

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要