Memory-Efficient Sequential Pattern Mining with Hybrid Tries
arXiv (Cornell University)(2022)
摘要
As modern data sets continue to grow exponentially in size, the demand for
efficient mining algorithms capable of handling such large data sets becomes
increasingly imperative. This paper develops a memory-efficient approach for
Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery
that faces a well-known memory bottleneck for large data sets. Our methodology
involves a novel hybrid trie data structure that exploits recurring patterns to
compactly store the data set in memory; and a corresponding mining algorithm
designed to effectively extract patterns from this compact representation.
Numerical results on real-life test instances show an average improvement of
88
data sets compared to the state of the art. Furthermore, our algorithm stands
out as the only capable SPM approach for large data sets within 256GB of system
memory.
更多查看译文
关键词
pattern,memory efficient
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要