AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
This paper presents Deep Embedded Clustering, or Deep Embedded Clustering— an algorithm that clusters a set of data points in a jointly optimized feature space
Unsupervised Deep Embedding for Clustering Analysis
international conference on machine learning, (2016)
Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature represen...More
PPT (Upload PPT)
- Clustering, an essential data analysis and visualization tool, has been studied extensively in unsupervised machine learning from different perspectives: What defines a cluster? What is the right distance metric? How to efficiently group instances into clusters? How to validate clusters? And so on.
- Little work has focused on the unsupervised learning of the feature space in which to perform clustering.
- A notion of distance or dissimilarity is central to data clustering algorithms.
- In turn, relies on representing the data in a feature space.
- The k-means clustering algorithm (MacQueen et al, 1967), for example, uses the Euclidean distance between points in a given feature space, which for images might be raw pixels or gradient-
- Clustering, an essential data analysis and visualization tool, has been studied extensively in unsupervised machine learning from different perspectives: What defines a cluster? What is the right distance metric? How to efficiently group instances into clusters? How to validate clusters? And so on
- Numerous different distance functions and embedding methods have been explored in the literature
- Little work has focused on the unsupervised learning of the feature space in which to perform clustering
- We show qualitative and quantitative results that demonstrate the benefit of Deep Embedded Clustering compared to LDGMI and SEC
- This paper presents Deep Embedded Clustering, or Deep Embedded Clustering— an algorithm that clusters a set of data points in a jointly optimized feature space
- MNIST STL-HOG REUTERS-10k REUTERS k-means LDMGI N/A SEC.
- DEC w/o backprop 79.82% 34.06%.
- DEC articles.
- The authors computed tf-idf features on the 2000 most frequently occurring word stems.
- Since some algorithms do not scale to the full Reuters dataset, the authors sampled a random subset of 10000 examples, which the authors call REUTERS-10k, for comparison purposes.
- A summary of dataset statistics is shown in Table 1.
- The authors normalize all datasets so that 1 d xi.
- 2 2 is approximately 1, where d is the dimensionality of the data space point xi ∈ X
- Evaluation Metric
The authors use the standard unsupervised evaluation metric and protocols for evaluations and comparisons to other algorithms (Yang et al, 2010).
- M n where li is the ground-truth label, ci is the cluster assignment produced by the algorithm, and m ranges over all possible one-to-one mappings between clusters and labels.
- This metric takes a cluster assignment from an unsupervised algorithm and a ground truth assignment and finds the best matching between them.
- The underlying assumption of DEC is that the initial classifier’s high confidence predictions are mostly correct
- To verify that this assumption holds for the task and that the choice of P has the desired properties, the authors plot the magnitude of the gradient of L with respect to each embedded point, |∂L/∂zi|, against its soft assignment, qij, to a ran-.
- AE+LDMGI 83.98% 32.04% AE+SECThis paper presents Deep Embedded Clustering, or DEC— an algorithm that clusters a set of data points in a jointly optimized feature space.
- DEC has the virtue of linear complexity in the number of data points which allows it to scale to large datasets
- Table1: Dataset statistics. # Points # classes Dimension
- Table2: Comparison of clustering accuracy (Eq 10) on four datasets
- Table3: Comparison of clustering accuracy (Eq 10) on autoencoder (AE) feature
- Table4: Clustering accuracy (Eq 10) on imbalanced subsample of MNIST
- Clustering has been extensively studied in machine learning in terms of feature selection (Boutsidis et al, 2009; Liu & Yu, 2005; Alelyani et al, 2013), distance functions (Xing et al, 2002; Xiang et al, 2008), grouping methods (MacQueen et al, 1967; Von Luxburg, 2007; Li et al, 2004), and cluster validation (Halkidi et al, 2001). Space does not allow for a comprehensive literature study and we refer readers to (Aggarwal & Reddy, 2013) for a survey.
One branch of popular methods for clustering is kmeans (MacQueen et al, 1967) and Gaussian Mixture Models (GMM) (Bishop, 2006). These methods are fast and applicable to a wide range of problems. However, their distance metrics are limited to the original data space and they tend to be ineffective when input dimensionality is high (Steinbach et al, 2004).
Several variants of k-means have been proposed to address issues with higher-dimensional input spaces. De la Torre & Kanade (2006); Ye et al (2008) perform joint dimensionality reduction and clustering by first clustering the data with k-means and then projecting the data into a lower dimensions where the inter-cluster variance is maximized. This process is repeated in EM-style iterations until convergence. However, this framework is limited to linear embedding; our method employs deep neural networks to perform non-linear embedding that is necessary for more complex data.
- This work is in part supported by ONR N00014-13-1-0720, NSF IIS- 1338054, and Allen Distinguished Investigator Award
- Aggarwal, Charu C and Reddy, Chandan K. Data clustering: algorithms and applications. CRC Press, 2013.
- Alelyani, Salem, Tang, Jiliang, and Liu, Huan. Feature selection for clustering: A review. Data Clustering: Algorithms and Applications, 2013.
- Bellman, R. Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, New Jersey, 1961.
- Bengio, Yoshua, Courville, Aaron, and Vincent, Pascal. Representation learning: A review and new perspectives. 2013.
- Bishop, Christopher M. Pattern recognition and machine learning. springer New York, 2006.
- Boutsidis, Christos, Drineas, Petros, and Mahoney, Michael W. Unsupervised feature selection for the kmeans clustering problem. In NIPS, 2009.
- Coates, Adam, Ng, Andrew Y, and Lee, Honglak. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics, pp. 215–223, 2011.
- De la Torre, Fernando and Kanade, Takeo. Discriminative cluster analysis. In ICML, 2006.
- Doersch, Carl, Singh, Saurabh, Gupta, Abhinav, Sivic, Josef, and Efros, Alexei. What makes paris look like paris? ACM Transactions on Graphics, 2012.
- Girshick, Ross, Donahue, Jeff, Darrell, Trevor, and Malik, Jitendra. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- Halkidi, Maria, Batistakis, Yannis, and Vazirgiannis, Michalis. On clustering validation techniques. Journal of Intelligent Information Systems, 2001.
- Hinton, Geoffrey E and Salakhutdinov, Ruslan R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
- Hornik, Kurt. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
- Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, Jonathan, Girshick, Ross, Guadarrama, Sergio, and Darrell, Trevor. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
- Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- Kuhn, Harold W. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83– 97, 1955.
- Le, Quoc V. Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8595–8598. IEEE, 2013.
- LeCun, Yann, Bottou, Leon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998.
- Lewis, David D, Yang, Yiming, Rose, Tony G, and Li, Fan. Rcv1: A new benchmark collection for text categorization research. JMLR, 2004.
- Li, Tao, Ma, Sheng, and Ogihara, Mitsunori. Entropybased criterion in categorical clustering. In ICML, 2004.
- Liu, Huan and Yu, Lei. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 2005.
- Long, Jonathan, Shelhamer, Evan, and Darrell, Trevor. Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038, 2014.
- MacQueen, James et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281–297, 1967.
- Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
- Nie, Feiping, Zeng, Zinan, Tsang, Ivor W, Xu, Dong, and Zhang, Changshui. Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks, 2011.
- Nigam, Kamal and Ghani, Rayid. Analyzing the effectiveness and applicability of co-training. In Proc. of the ninth international conference on Information and knowledge management, 2000.
- Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 2014.
- Tian, Fei, Gao, Bin, Cui, Qing, Chen, Enhong, and Liu, Tie-Yan. Learning deep representations for graph clustering. In AAAI Conference on Artificial Intelligence, 2014.
- van der Maaten, Laurens. Learning a parametric embedding by preserving local structure. In International Conference on Artificial Intelligence and Statistics, 2009.
- van Der Maaten, Laurens. Accelerating t-SNE using treebased algorithms. JMLR, 2014.
- van der Maaten, Laurens and Hinton, Geoffrey. Visualizing data using t-SNE. JMLR, 2008.
- Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, Pierre-Antoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR, 2010.
- Von Luxburg, Ulrike. A tutorial on spectral clustering. Statistics and computing, 2007.
- Xiang, Shiming, Nie, Feiping, and Zhang, Changshui. Learning a mahalanobis distance metric for data clustering and classification. Pattern Recognition, 2008.
- Xing, Eric P, Jordan, Michael I, Russell, Stuart, and Ng, Andrew Y. Distance metric learning with application to clustering with side-information. In NIPS, 2002.
- Yan, Donghui, Huang, Ling, and Jordan, Michael I. Fast approximate spectral clustering. In ACM SIGKDD, 2009.
- Yang, Yi, Xu, Dong, Nie, Feiping, Yan, Shuicheng, and Zhuang, Yueting. Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing, 2010.
- Ye, Jieping, Zhao, Zheng, and Wu, Mingrui. Discriminative k-means for clustering. In NIPS, 2008.
- Zeiler, Matthew D and Fergus, Rob. Visualizing and understanding convolutional networks. In ECCV. 2014.