Self-taught learning

Self-taught learning(2009)

引用 721|浏览56
暂无评分
摘要
We introduce a new machine learning framework called self-taught learning for using unlabeled data in supervised classification tasks. This framework does not require that the unlabeled data follow the class labels of the supervised task, or arise from the same generative distribution. Such unlabeled data is often significantly easier to obtain than in previously studied frameworks such as semi-supervised learning. In this thesis, we demonstrate that self-taught learning can be applied successfully to a variety of hard machine learning problems. The centerpiece of our work is a self-taught learning algorithm based on an optimization problem called "sparse coding." This algorithm uses unlabeled data to learn a new representation for complex, high-dimensional inputs, and then applies supervised learning over this representation. The representation captures higher-level aspects of the input, and significantly improves classification performance on many test domains, including computer vision, audio recognition and text classification. We present efficient sparse coding algorithms for a translation-invariant version of the model, that can be applied to audio and image data. We also generalize the model to a much broader class of inputs, including domains that are hard to handle with previous algorithms, and apply the model to text classification and a robotic perception task. Taken together, these experiments demonstrate that using the self-taught learning framework, machine learning can be applied to much harder problems than previously possible. These self-taught learning algorithms work best when they are allowed to learn rich models (with millions of parameters) using large amounts of unlabeled data (millions of examples). Unfortunately, with current methods, it can take weeks to learn such rich models. Further, these methods require fast, sequential updates, and with current algorithms, are not conducive to being parallelized on a distributed cluster. To apply self-taught learning to such large-scale problems, we show that graphics processor hardware (available in most modern desktops) can be used to massively parallelize the algorithms. Using a new inherently parallel algorithm, the sparse coding algorithm can be easily implemented on graphics processors, and we show that this can reduce the learning time from about three weeks to a single day. Finally, we consider self-taught learning methods that learn hierarchical representations using unlabeled data. We develop general principles for unsupervised learning of such hierarchical models using graphics processors, and show that the slow learning algorithms for the popular deep belief network model can be successfully parallelized. This implementation is up to 70 times faster than an optimized CPU implementation, reduces the learning time from weeks to hours, and represents the state-of-the-art in learning large deep belief networks.
更多
查看译文
关键词
image data,text classification,machine learning,graphics processor,unsupervised learning,semi-supervised learning,rich model,unlabeled data,self-taught learning,slow learning algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要