Coloring Embedder: A Memory Efficient Data Structure for Answering Multi-set Query

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 19|浏览208
暂无评分
摘要
Multi-set query is a fundamental issue in data science. When the sizes of multi-sets are large, exact matching methods like hash tables need too much memory, and they cannot achieve high query speed. Bloom filters are recently used to handle big data query, but they cannot achieve high accuracy when the memory space is tight. In this paper, we propose a new data structure named coloring embedder, which is fast, accurate as well as memory efficient. The insight is to first map elements to a high dimensional space to almost eliminate hashing collisions, and then use a dimensional reduction representation, which is similar to coloring a graph, to save memory. Theoretical proofs and experimental results show that compared to the state-of-the[1]art, the error rate of the coloring embedder is thousands of times smaller even with much less memory usage, and the query speed of the coloring embedder is about 2 times faster. The source code of coloring embedder is released on Github.
更多
查看译文
关键词
Data structures,Error analysis,Image color analysis,Big Data,Hash functions,Art,Memory management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要