Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances.

Edwin Lughofer,Eva Weigl,Wolfgang Heidl,Christian Eitzinger,Thomas Radauer

Inf. Sci.（2016）

引用 68|浏览10

暂无评分

摘要

In classification-based stream mining, drift detection is essential in order to (i) inform operators when unintended system changes occur and (ii) make classifier updates more flexible when changes are intentional. Current detection approaches usually rely on the assumption that fully supervised labeled streams are available for monitoring (the changes in) classifier performance. This is an unrealistic scenario in many on-line real-world applications as true class labels would have to be known, which usually requires tedious feedback efforts of operators working with the systems. We propose two techniques to improve economy and applicability of current drift detection techniques: (i) a semi-supervised approach that employs single-pass active learning filters to select the most interesting samples for supervising classifier performance and (ii) a fully unsupervised approach based on the degree of overlap between a classifier's output certainty distributions that can be applied to any unlabeled classification stream. For both variants, a specific handling of imbalanced class distributions in the streams is proposed, which allows also possible downtrends in classifier behavior for under-represented classes to be observed. The statistical monitoring of classifier behavior relies on a modified version of the Page-Hinkley test, where a fading factor and an automatic thresholding concept (based on the Hoeffding bound) were introduced to render it more flexible for detecting successive drift occurrences in a stream. We compared our approaches to the fully supervised variant in two real-world on-line applications, including a systematic analysis of the capabilities of our methods. The semi-supervised approach was able to detect real as well as artificially built-in drifts in these streams with a similar delay (of about 5-6¿min) as the supervised variant, and this with only 20% actively selected samples. The unsupervised variant was able to detect input space drifts with reasonable delays as well, but failed to detect target concept drifts - using both approaches in tandem therefore allows us to distinguish between input space and target concept drifts.

查看译文

关键词

Data stream classification,Input space and target concept drift,Drift detection,Scarcely labeled and unlabeled streams,Semi-supervised and unsupervised performance indicators,Single-pass active learning filter

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要