Efficient weighted sequential pattern mining

Shaotao Chen, Jiahui Chen,Shicheng Wan

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 23|浏览12
暂无评分
摘要
In real-life applications, data mining task involves extracting valuable but hidden information from massive data. How to effectively find out interesting patterns from large databases is a current topic. Sequential pattern mining is the most popular approach in data mining domain. Traditional sequential pattern mining research generally focuses on discovering frequent sequential patterns. However, the account of occurrence times of patterns does not adequately indicate their importance. For instance, frequent patterns (e.g., pencil and eraser) are not profitable, whereas infrequent patterns (e.g., extreme weather) are high-risk. To extract more useful information, researchers study a weighted sequential pattern mining task. In this paper, an efficient algorithm for weighted sequential pattern mining task, called EWSPM, is proposed. Two new strict upper bounds, namely MWEbound and MSRIWbound , are designed based on the concepts of maximum weight estimation (simplified as MWE) and maximum sumation of remaining item weights (simplified as MSRIW), respectively. These upper bounds achieve better pruning effects and reduce the size of search space during the mining process, which significantly shortens execution time. In addition, a database-projection method is employed to optimize memory usage. It addresses potential memory explosion issues in a certain degree. Finally, we also conducted extensive experiments on nine datasets (including real and synthetic). The experimental results demonstrate that the EWSPM algorithm is capable of mining all interesting patterns efficiently, with the smallest size of search space. Additionally, the novel algorithm also exhibits superior performance in terms of execution time and memory consumption.
更多
查看译文
关键词
Data mining,Sequential pattern mining,Remaining item,Weighted sequence,Tighter upper bound
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要