Ensemble Direct Density Ratio Estimation For Multistream Classification

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)(2018)

引用 9|浏览57
暂无评分
摘要
In traditional machine learning, it is assumed that training data conforming to the stationary distribution of test data is readily available. Yet, such an assumption is not valid in practice due to a high cost of obtaining the truth value of data instances. This is particularly true when computing over non-stationary data streams. Recent studies in the multistream setting aim to address this issue by leveraging a stream of data with biased labeled instances (called the source stream) to train a suitable model for prediction over unlabeled instances (called the target stream). They use sampling bias correction techniques as a preprocessing step for estimating source instance weights for training a bias-corrected classifier useful to predict label of target data instances. In this regard, a recent framework proposes to utilize a Gaussian kernel model to estimate source instance weights. Unfortunately, it suffers from large computational time complexity and consequently deteriorates the rate at which streaming data is processed. In this paper, we address this issue by proposing a divide-and-conquer method suitable for simultaneously evaluating source instance weights and detecting changes in distribution over time. Our empirical results demonstrate a significant gain in execution time, compared to the previous approach, while achieving similar or better classification accuracy on real-world datasets.
更多
查看译文
关键词
data stream classification,ensemble methods,sampling bias,concept drift
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要