Exponential Family Sparse Coding With Applications To Self-Taught Learning

Honglak Lee,Rajat Raina,Alex Teichman,Andrew Y. Ng

International Joint Conference on Artificial Intelligence（2009）

引用 94|浏览169

暂无评分

摘要

Sparse coding is an unsupervised learning algorithm for finding concise, slightly higher-level representations of inputs, and has been successfully applied to self-taught learning, where the goal is to use unlabeled data to help on a supervised learning task, even if the unlabeled data cannot be associated with the labels of the supervised task [Raina et al., 2007]. However, sparse coding uses a Gaussian noise model and a quadratic loss function, and thus performs poorly if applied to binary valued, integer valued, or other non-Gaussian data, such as text. Drawing on ideas from generalized linear models (GLMs), we present a generalization of sparse coding to learning with data drawn from any exponential family distribution (such as Bernoulli, Poisson, etc). This gives a method that we argue is much better suited to model other data types than Gaussian. We present an algorithm for solving the L-1-regularized optimization problem defined by this model, and show that it is especially efficient when the optimal solution is sparse. We also show that the new model results in significantly improved self-taught learning performance when applied to text classification and to a robotic perception task.

查看译文

关键词

unlabeled data,sparse coding,data type,non-Gaussian data,supervised learning task,unsupervised learning algorithm,Gaussian noise model,generalized linear model,new model result,robotic perception task,exponential family sparse

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要