Semi-supervised Learning over Streaming Data using MOA

Langshi Chen,Jiayu Li,Cenk Sahinalp,Madhav Marathe,Anil Vullikanti,Andrey Nikolaev

user-5ca99f0c530c702a92b1df51（2019）

引用 7|浏览82

暂无评分

摘要

Machine learning algorithms for data streams usually suppose that all data examples available for learning are strictly labeled. Unfortunately, in real-world scenarios, data examples are not always labeled. Semi-supervised learning is a challenging task to learn using labeled and unlabeled data at the same time. It is especially relevant in the context of data streams, where the data is generated in real-time, and the labels may be missing due to various factors (e.g., network delay, errors during the communication between sensors, expensive labeling process, and others). In this paper, we present two novel approaches to handle missing labels for classification learning in data streams, namely cluster-and-label and self-training. We discuss the strengths and weaknesses of each solution to establish a baseline to evaluate semi-supervised learning techniques in data streams. These methods are implemented inside the MOA (Massive Online Analysis) open-source software as an internal benchmark component, to help researchers to run experimental comparisons on semi-supervised learning on data streams easily.

查看译文

关键词

semi-supervised learning, data streams

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要