A randomized algorithm for clustering discrete sequences

Pattern Recognition(2024)

引用 0|浏览3
暂无评分
摘要
Cluster analysis is one of the most important research issues in data mining and machine learning. To date, numerous clustering algorithms have been proposed to tackle the fixed-length vector data. In many real applications, we need to detect clusters from a set of discrete sequences in which each sequence is an ordered list of items. Due to the sequential and discrete nature, the discrete sequence clustering problem is more challenging and most of existing vector data clustering algorithms cannot be directly employed. In this paper, we present a stochastic algorithm for clustering discrete sequences. Our method first quickly generates a set of random partitions over the sequential data set and then merges these random clustering results via weighted graph construction and partition. We perform extensive empirical comparisons on real data sets to show that our method is comparable to those state-of-the-art clustering algorithms with respect to both accuracy and efficiency.
更多
查看译文
关键词
Sequence clustering,Sequential data analysis,Cluster analysis,Randomized algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要