# Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction

NIPS 2020, 2020.

EI

Weibo:

Abstract:

Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoi...More

Code:

Data:

Introduction

- Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points.
- Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces.
- These objectives derive from visual analytic tasks [3, 4].
- This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding

Highlights

- Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points
- Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces
- They are contradictory unless class and data neighborhood structures match each other well in both the data and embedding spaces: each class constitutes distinct areas with no cross-class neighborhood relations. This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding
- Our experiments showed that supervised DR techniques tend to over-separate classes that are adjacent or overlapping in the data space, while unsupervised techniques totally ignore classes
- Future work will extend this approach to other neighborhood embedding techniques, such as tSNE [9] or Jensen Shannon Embedding (JSE) [19], and consider the semi-supervised framework
- The proposed ClassNeRV method is exactly intended to reduce this second type of bias

Methods

- 4.1 Objectives, Data, Techniques

The authors illustrate the main characteristics of ClassNeRV compared to other unsupervised and supervised DR techniques, on a 3D toy dataset (Globe) and on two real high dimensional datasets (Isolet 5 and Digits). - The Globe dataset (Section 4.2) contains 512 data randomly distributed on the surface of the unit sphere in the three-dimensional Euclidean space R3 (Figure 2a).
- The two classes correspond to the two hemispheres divided at the equator.
- These data cannot be embedded in the plane without distortions, so the final map depends on the trade-off set between the neighborhood (τ ∗) and class (ε) penalizations (See Section 3.2).
- A random subset of 500 samples is considered to ease the readability of the maps in Figure 6

Results

- 4.3 Isolet Dataset The 10-NN confusion matrix computed in the 617D data space (Figure 5a) shows all classes with less than 90% accuracy or confused with at least 10% of another class (The full confusion matrix is given in the supplementary material)

Conclusion

- ClassNeRV allows data scientists to control class and structure preservation in low dimensional embeddings for exploratory data analysis of labeled data.
- Such analysis can for instance help detect whether classes are well-separated in a given feature space, which may lead to question the labels or the features.
- This work proposes improvement on a dimensionality reduction technique for exploratory data analysis.
- The proposed ClassNeRV method is exactly intended to reduce this second type of bias

Summary

## Introduction:

Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points.- Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces.
- These objectives derive from visual analytic tasks [3, 4].
- This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding
## Methods:

4.1 Objectives, Data, Techniques

The authors illustrate the main characteristics of ClassNeRV compared to other unsupervised and supervised DR techniques, on a 3D toy dataset (Globe) and on two real high dimensional datasets (Isolet 5 and Digits).- The Globe dataset (Section 4.2) contains 512 data randomly distributed on the surface of the unit sphere in the three-dimensional Euclidean space R3 (Figure 2a).
- The two classes correspond to the two hemispheres divided at the equator.
- These data cannot be embedded in the plane without distortions, so the final map depends on the trade-off set between the neighborhood (τ ∗) and class (ε) penalizations (See Section 3.2).
- A random subset of 500 samples is considered to ease the readability of the maps in Figure 6
## Results:

4.3 Isolet Dataset The 10-NN confusion matrix computed in the 617D data space (Figure 5a) shows all classes with less than 90% accuracy or confused with at least 10% of another class (The full confusion matrix is given in the supplementary material)## Conclusion:

ClassNeRV allows data scientists to control class and structure preservation in low dimensional embeddings for exploratory data analysis of labeled data.- Such analysis can for instance help detect whether classes are well-separated in a given feature space, which may lead to question the labels or the features.
- This work proposes improvement on a dimensionality reduction technique for exploratory data analysis.
- The proposed ClassNeRV method is exactly intended to reduce this second type of bias

Related work

- Unsupervised Embeddings. Many linear or non-linear algorithms have been previously proposed including Principal Component Analysis (PCA) [14], Self Organizing Maps (SOM) [15], isometric feature mapping (Isomap) [16], Data-Driven High Dimensional Scaling (DD-HDS) [17], Local Affine Multidimensional Projection (LAMP) [18] and Uniform Manifold Approximation and Projection (UMAP) [12]. Among the wide variety of techniques, Neighborhood Embedding (NE) techniques are efficient at preserving neighborhood structures and for computing time. Their probabilistic framework also provides a theoretical background to interpret the obtained maps in terms of a neighbourhood retrieval task [7]. NE methods compute, for each pair of points i, j, the probabilistic membership of point j to the neighborhood of point i, sometimes called similarity. These membership degrees are computed both in the data space and the embedding space. The mapping is obtained by minimizing the discrepancy of membership probabilities between these two spaces. These methods include Stochastic Neighbor Embedding (SNE) [8], t-distributed SNE (t-SNE) [9], Jensen Shannon Embedding (JSE) [19] and Neighborhood Retrieval Visualizer (NeRV) [6, 7]. SNE and t-SNE differ by the kernel used to compute their neighborhood membership degrees in the embedding space. JSE and NeRV both extend SNE to control the balance between false and missed neighbors. This tunability of NeRV and JSE makes them the best-suited for introducing supervision.

Funding

- Funding disclosure The work of Denys Dutykh has been supported by the French National Research Agency, through Investments for Future Program (ref. ANR−18−EURE−0016 — Solar Academy). Jaakko Peltonen was supported by Academy of Finland projects 313748 and 327352.

Study subjects and analysis

samples: 500

True class labels as well as randomly generated labels are considered to evaluate the robustness to mislabeling. A random subset of 500 samples is considered to ease the readability of the maps in Figure 6.

samples: 500

True class labels as well as randomly generated labels are considered to evaluate the robustness to mislabeling. A random subset of 500 samples is considered to ease the readability of the maps in Figure 6. We compare ClassNeRV to unsupervised PCA [14], Isomap [16], UMAP [12], tSNE [9], and NeRV [6, 7], and to supervised NCA [29], S-Isomap [10], ClassiMap [5] and S-UMAP [12]

Reference

- D. Sacha, L. Zhang, M. Sedlmair, J. A. Lee, J. Peltonen, D. Weiskopf, S. C. North, and D. A. Keim, “Visual interaction with dimensionality reduction: A structured literature analysis,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 241–250, 2017.
- J. Wenskovitch, I. Crandell, N. Ramakrishnan, L. House, S. Leman, and C. North, “Towards a systematic combination of dimension reduction and clustering in visual analytics,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, pp. 131–141, Jan 2018.
- L. G. Nonato and M. Aupetit, “Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, pp. 2650–2673, Aug. 2019.
- M. Brehmer, M. Sedlmair, S. Ingram, and T. Munzner, “Visualizing dimensionally-reduced data: interviews with analysts and a characterization of task sequences,” in Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, BELIV 2014, Paris, France, November 10, 2014 (H. Lam, P. Isenberg, T. Isenberg, and M. Sedlmair, eds.), pp. 1–8, ACM, 2014.
- S. Lespinats, M. Aupetit, and A. Meyer-Base, “ClassiMap: A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 29, p. 150505235857008, May 2015.
- J. Venna and S. Kaski, “Nonlinear dimensionality reduction as information retrieval,” in Artificial intelligence and statistics, pp. 572–579, 2007.
- J. Venna, J. Peltonen, K. Nybo, H. Aidos, and S. Kaski, “Information retrieval perspective to nonlinear dimensionality reduction for data visualization,” Journal of Machine Learning Research, vol. 11, no. Feb, pp. 451–490, 2010.
- G. E. Hinton and S. T. Roweis, “Stochastic neighbor embedding,” in Advances in neural information processing systems, pp. 857–864, 2003.
- L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
- Xin Geng, De-Chuan Zhan, and Zhi-Hua Zhou, “Supervised nonlinear dimensionality reduction for visualization and classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 35, pp. 1098–1107, Dec. 2005.
- J. Peltonen, H. Aidos, and S. Kaski, “Supervised nonlinear dimensionality reduction by Neighbor Retrieval,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1809–1812, Apr. 2009.
- L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv:1802.03426 [cs, stat], Feb. 2018. arXiv: 1802.03426.
- J. Venna and S. Kaski, “Neighborhood preservation in nonlinear projection methods: An experimental study,” in International Conference on Artificial Neural Networks, pp. 485–491, Springer, 2001.
- K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, pp. 559–572, Nov. 1901.
- T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
- J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” science, vol. 290, no. 5500, pp. 2319–2323, 2000.
- S. Lespinats, M. Verleysen, A. Giron, and B. Fertil, “DD-HDS: A method for visualization and exploration of high-dimensional data,” IEEE Transactions on Neural Networks, vol. 18, no. 5, pp. 1265–1279, 2007.
- P. Joia, D. Coimbra, J. A. Cuminato, F. V. Paulovich, and L. G. Nonato, “Local Affine Multidimensional Projection,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, pp. 2563–2571, Dec. 2011.
- J. A. Lee, E. Renard, G. Bernard, P. Dupont, and M. Verleysen, “Type 1 and 2 mixtures of Kullback–Leibler divergences as cost functions in dimensionality reduction based on similarity preservation,” Neurocomputing, vol. 112, pp. 92–108, July 2013.
- O. Kouropteva, O. Okun, and M. Pietikäinen, “Supervised locally linear embedding algorithm for pattern recognition,” in Iberian Conference on Pattern Recognition and Image Analysis, pp. 386–394, Springer, 2003.
- S.-q. Zhang, “Enhanced supervised locally linear embedding,” Pattern Recognition Letters, vol. 30, no. 13, pp. 1208–1218, 2009.
- L. Zhao and Z. Zhang, “Supervised locally linear embedding with probability-based distance for classification,” Computers & Mathematics with Applications, vol. 57, no. 6, pp. 919–926, 2009.
- C.-G. Li and J. Guo, “Supervised isomap with explicit mapping,” in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC’06), vol. 3, pp. 345–348, IEEE, 2006.
- Z. Yang, I. King, Z. Xu, and E. Oja, “Heavy-tailed symmetric stochastic neighbor embedding,” in Advances in neural information processing systems, pp. 2169–2177, 2009.
- R. A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
- S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Mullers, “Fisher discriminant analysis with kernels,” in Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468), pp. 41–48, Ieee, 1999.
- D. De Ridder, M. Loog, and M. J. Reinders, “Local fisher embedding,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 2, pp. 295–298, IEEE, 2004.
- M. Sugiyama, “Local fisher discriminant analysis for supervised dimensionality reduction,” in Proceedings of the 23rd international conference on Machine learning, pp. 905–912, 2006.
- J. Goldberger, G. E. Hinton, S. T. Roweis, and R. R. Salakhutdinov, “Neighbourhood Components Analysis,” in Advances in Neural Information Processing Systems 17 (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 513–520, MIT Press, 2005.
- R. Salakhutdinov and G. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure,” in Artificial Intelligence and Statistics, pp. 412–419, 2007.
- K. Bunte, P. Schneider, B. Hammer, F.-M. Schleif, T. Villmann, and M. Biehl, “Limited Rank Matrix Learning, discriminative dimension reduction and visualization,” Neural Networks, vol. 26, pp. 159–173, Feb. 2012.
- C. de Bodt, D. Mulders, D. L. Sánchez, M. Verleysen, and J. A. Lee, “Class-aware t-SNE: cat-SNE.,” in ESANN, 2019.
- J. Venna and S. Kaski, “Local multidimensional scaling,” Neural Networks, vol. 19, pp. 889–899, July 2006.
- C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional spaces,” in Proceedings of the 8th International Conference on Database Theory, ICDT ’01, (Berlin, Heidelberg), p. 420–434, Springer-Verlag, 2001.
- J. A. Lee and M. Verleysen, “Two key properties of dimensionality reduction methods,” in Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on, pp. 163–170, IEEE, 2014.
- M. Vladymyrov and M. A. Carreira-Perpinan, “Entropic Affinities: Properties and Efficient Numerical Computation.,” in ICML (3), pp. 477–485, 2013.
- J. A. Lee, D. H. Peluffo-Ordóñez, and M. Verleysen, “Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure,” Neurocomputing, vol. 169, pp. 246–261, Dec. 2015.
- J. Nocedal and S. J. Wright, Numerical optimization. Springer series in operations research, New York: Springer, 1999.
- Z. Yang, J. Peltonen, and S. Kaski, “Scalable optimization of neighbor embedding for visualization,” in International Conference on Machine Learning, pp. 127–135, 2013.
- L. Van Der Maaten, “Accelerating t-SNE using tree-based algorithms,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3221–3245, 2014.
- J. Venna, Dimensionality reduction for visual exploration of similarity structures. PhD thesis, Helsinki University of Technology, Espoo, 2007. OCLC: 231147068.
- M. Fanty and R. Cole, “Spoken letter recognition,” in Advances in Neural Information Processing Systems, pp. 220–226, 1991.
- D. Dua and E. Karra Taniskidou, “UCI machine learning repository,” 2017.
- E. Alpaydin and C. Kaynak, “Cascading classifiers,” Kybernetika, vol. 34, no. 4, pp. 369–374, 1998.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- L. McInnes, J. Healy, N. Saul, and L. Grossberger, “Umap: Uniform manifold approximation and projection,” The Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018.
- B. Colange, “Classnerv.” https://doi.org/10.5281/zenodo.4094851, Oct.2020.
- F. Degret and S. Lespinats, “Circular background decreases misunderstanding of multidimensional scaling results for naive readers,” in MATEC Web of Conferences, vol. 189, p. 10002, EDP Sciences, 2018.

Tags

Comments