Chrome Extension
WeChat Mini Program
Use on ChatGLM

RHUPS: Mining Recent High Utility Patterns with Sliding Window-based Arrival Time Control over Data Streams

ACM Transactions on Intelligent Systems and Technology (TIST)(2021)SCI 3区

Sejong Univ | Western Norway Univ Appl Sci | Ho Chi Minh City Univ Technol HUTECH | Univ Alberta

Cited 21|Views15
Abstract
Databases that deal with the real world have various characteristics. New data is continuously inserted over time without limiting the length of the database, and a variety of information about the items constituting the database is contained. Recently generated data has a greater influence than the previously generated data. These are called the time-sensitive non-binary stream databases, and they include databases such as web-server click data, market sales data, data from sensor networks, and network traffic measurement. Many high utility pattern mining and stream pattern mining methods have been proposed so far. However, they have a limitation that they are not suitable to analyze these databases, because they find valid patterns by analyzing a database with only some of the features described above. Therefore, knowledge-based software about how to find meaningful information efficiently by analyzing databases with these characteristics is required. In this article, we propose an intelligent information system that calculates the influence of the insertion time of each batch in a large-scale stream database by applying the sliding window model and mines recent high utility patterns without generating candidate patterns. In addition, a novel list-based data structure is suggested for a fast and efficient management of the time-sensitive stream databases. Moreover, our technique is compared with state-of-the-art algorithms through various experiments using real datasets and synthetic datasets. The experimental results showthat our approach outperforms the previously proposed methods in terms of runtime, memory usage, and scalability.
More
Translated text
Key words
Recent high utility pattern,stream database,sliding window,evolutionary time-fading factor
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种基于滑动窗口模型和新型列表数据结构,用于挖掘时敏型非二进制数据流中近期高效用模式的方法,有效提升了处理时间敏感数据流的性能和效率。

方法】:通过应用滑动窗口模型控制数据批次插入时间的影响,并采用不生成候选模式的方法直接挖掘高效用模式,同时提出了一种适合时敏型数据流管理的新型列表数据结构。

实验】:作者使用了真实数据集和合成数据集进行了实验,对比了现有先进算法,结果显示所提方法在运行时间、内存使用和可扩展性方面均优于之前的方法。