Shrinkage Fisher Information Embedding Of High Dimensional Feature Distributions

2011 CONFERENCE RECORD OF THE FORTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS (ASILOMAR)(2011)

引用 1|浏览28
暂无评分
摘要
In this paper, we introduce a dimensionality reduction method that can be applied to clustering of high dimensional empirical distributions. The proposed approach is based on stabilized information geometrical representation of the feature distributions. The problem of dimensionality reduction on spaces of distribution functions arises in many applications including hyperspectral imaging, document clustering, and classifying flow cytometry data. Our method is a shrinkage regularized version of Fisher information distance, that we call shrinkage FINE (sFINE), which is implemented by Steinian shrinkage estimation of the matrix of Kullback Liebler distances between feature distributions. The proposed method involves computing similarities using shrinkage regularized Fisher information distance between probability density functions (PDFs) of the data features, then applying Laplacian eigenmaps on a derived similarity matrix to accomplish the embedding and perform clustering. The shrinkage regularization controls the trade-off between bias and variance and is especially well-suited for clustering empirical probability distributions of high-dimensional data sets. We also show significant gains in clustering performance on both of the UCI dataset and a spam data set. Finally we demonstrate the superiority of embedding and clustering distributional data using sFINE as compared to other state-of-the-art methods such as non-parametric information clustering, support vector machine (SVM) and sparse K-means.
更多
查看译文
关键词
distribution function,support vector machine,empirical distribution,k means,probability distribution,fisher information,probability density function,flow cytometry,high dimensional data,document clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要