Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction

Benoît Colange
Benoît Colange
Jaakko Peltonen
Jaakko Peltonen

NIPS 2020, 2020.

Cited by: 0|Views10
EI
Weibo:
Our experiments showed that supervised Dimensionality Reduction techniques tend to over-separate classes that are adjacent or overlapping in the data space, while unsupervised techniques totally ignore classes

Abstract:

Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoi...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points.
  • Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces.
  • These objectives derive from visual analytic tasks [3, 4].
  • This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding
Highlights
  • Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points
  • Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces
  • They are contradictory unless class and data neighborhood structures match each other well in both the data and embedding spaces: each class constitutes distinct areas with no cross-class neighborhood relations. This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding
  • Our experiments showed that supervised DR techniques tend to over-separate classes that are adjacent or overlapping in the data space, while unsupervised techniques totally ignore classes
  • Future work will extend this approach to other neighborhood embedding techniques, such as tSNE [9] or Jensen Shannon Embedding (JSE) [19], and consider the semi-supervised framework
  • The proposed ClassNeRV method is exactly intended to reduce this second type of bias
Methods
  • 4.1 Objectives, Data, Techniques

    The authors illustrate the main characteristics of ClassNeRV compared to other unsupervised and supervised DR techniques, on a 3D toy dataset (Globe) and on two real high dimensional datasets (Isolet 5 and Digits).
  • The Globe dataset (Section 4.2) contains 512 data randomly distributed on the surface of the unit sphere in the three-dimensional Euclidean space R3 (Figure 2a).
  • The two classes correspond to the two hemispheres divided at the equator.
  • These data cannot be embedded in the plane without distortions, so the final map depends on the trade-off set between the neighborhood (τ ∗) and class (ε) penalizations (See Section 3.2).
  • A random subset of 500 samples is considered to ease the readability of the maps in Figure 6
Results
  • 4.3 Isolet Dataset The 10-NN confusion matrix computed in the 617D data space (Figure 5a) shows all classes with less than 90% accuracy or confused with at least 10% of another class (The full confusion matrix is given in the supplementary material)
Conclusion
  • ClassNeRV allows data scientists to control class and structure preservation in low dimensional embeddings for exploratory data analysis of labeled data.
  • Such analysis can for instance help detect whether classes are well-separated in a given feature space, which may lead to question the labels or the features.
  • This work proposes improvement on a dimensionality reduction technique for exploratory data analysis.
  • The proposed ClassNeRV method is exactly intended to reduce this second type of bias
Summary
  • Introduction:

    Dimensionality Reduction (DR) methods aim at mapping a high dimensional dataset as points in a lower dimensional embedding space, while preserving some similarity measure between data points.
  • Exploratory data analysis is typical of unsupervised DR techniques which operate without knowledge of class information: data neighborhood structure is prioritized and measured as a discrepancy between data similarities in both original and embedding spaces.
  • These objectives derive from visual analytic tasks [3, 4].
  • This ideal case is very unlikely because the data neighborhood structure and classes do not always match in the data space, and low dimensional embeddings of high-dimensional data come with unavoidable distortions [3]: false neighbors which are neighboring points in the embedding but not in the data, and missed neighbors which are neighbors in the data but not in the embedding
  • Methods:

    4.1 Objectives, Data, Techniques

    The authors illustrate the main characteristics of ClassNeRV compared to other unsupervised and supervised DR techniques, on a 3D toy dataset (Globe) and on two real high dimensional datasets (Isolet 5 and Digits).
  • The Globe dataset (Section 4.2) contains 512 data randomly distributed on the surface of the unit sphere in the three-dimensional Euclidean space R3 (Figure 2a).
  • The two classes correspond to the two hemispheres divided at the equator.
  • These data cannot be embedded in the plane without distortions, so the final map depends on the trade-off set between the neighborhood (τ ∗) and class (ε) penalizations (See Section 3.2).
  • A random subset of 500 samples is considered to ease the readability of the maps in Figure 6
  • Results:

    4.3 Isolet Dataset The 10-NN confusion matrix computed in the 617D data space (Figure 5a) shows all classes with less than 90% accuracy or confused with at least 10% of another class (The full confusion matrix is given in the supplementary material)
  • Conclusion:

    ClassNeRV allows data scientists to control class and structure preservation in low dimensional embeddings for exploratory data analysis of labeled data.
  • Such analysis can for instance help detect whether classes are well-separated in a given feature space, which may lead to question the labels or the features.
  • This work proposes improvement on a dimensionality reduction technique for exploratory data analysis.
  • The proposed ClassNeRV method is exactly intended to reduce this second type of bias
Related work
  • Unsupervised Embeddings. Many linear or non-linear algorithms have been previously proposed including Principal Component Analysis (PCA) [14], Self Organizing Maps (SOM) [15], isometric feature mapping (Isomap) [16], Data-Driven High Dimensional Scaling (DD-HDS) [17], Local Affine Multidimensional Projection (LAMP) [18] and Uniform Manifold Approximation and Projection (UMAP) [12]. Among the wide variety of techniques, Neighborhood Embedding (NE) techniques are efficient at preserving neighborhood structures and for computing time. Their probabilistic framework also provides a theoretical background to interpret the obtained maps in terms of a neighbourhood retrieval task [7]. NE methods compute, for each pair of points i, j, the probabilistic membership of point j to the neighborhood of point i, sometimes called similarity. These membership degrees are computed both in the data space and the embedding space. The mapping is obtained by minimizing the discrepancy of membership probabilities between these two spaces. These methods include Stochastic Neighbor Embedding (SNE) [8], t-distributed SNE (t-SNE) [9], Jensen Shannon Embedding (JSE) [19] and Neighborhood Retrieval Visualizer (NeRV) [6, 7]. SNE and t-SNE differ by the kernel used to compute their neighborhood membership degrees in the embedding space. JSE and NeRV both extend SNE to control the balance between false and missed neighbors. This tunability of NeRV and JSE makes them the best-suited for introducing supervision.
Funding
  • Funding disclosure The work of Denys Dutykh has been supported by the French National Research Agency, through Investments for Future Program (ref. ANR−18−EURE−0016 — Solar Academy). Jaakko Peltonen was supported by Academy of Finland projects 313748 and 327352.
Study subjects and analysis
samples: 500
True class labels as well as randomly generated labels are considered to evaluate the robustness to mislabeling. A random subset of 500 samples is considered to ease the readability of the maps in Figure 6.

samples: 500
True class labels as well as randomly generated labels are considered to evaluate the robustness to mislabeling. A random subset of 500 samples is considered to ease the readability of the maps in Figure 6. We compare ClassNeRV to unsupervised PCA [14], Isomap [16], UMAP [12], tSNE [9], and NeRV [6, 7], and to supervised NCA [29], S-Isomap [10], ClassiMap [5] and S-UMAP [12]

Reference
  • D. Sacha, L. Zhang, M. Sedlmair, J. A. Lee, J. Peltonen, D. Weiskopf, S. C. North, and D. A. Keim, “Visual interaction with dimensionality reduction: A structured literature analysis,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 241–250, 2017.
    Google ScholarLocate open access versionFindings
  • J. Wenskovitch, I. Crandell, N. Ramakrishnan, L. House, S. Leman, and C. North, “Towards a systematic combination of dimension reduction and clustering in visual analytics,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, pp. 131–141, Jan 2018.
    Google ScholarLocate open access versionFindings
  • L. G. Nonato and M. Aupetit, “Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, pp. 2650–2673, Aug. 2019.
    Google ScholarLocate open access versionFindings
  • M. Brehmer, M. Sedlmair, S. Ingram, and T. Munzner, “Visualizing dimensionally-reduced data: interviews with analysts and a characterization of task sequences,” in Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, BELIV 2014, Paris, France, November 10, 2014 (H. Lam, P. Isenberg, T. Isenberg, and M. Sedlmair, eds.), pp. 1–8, ACM, 2014.
    Google ScholarLocate open access versionFindings
  • S. Lespinats, M. Aupetit, and A. Meyer-Base, “ClassiMap: A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 29, p. 150505235857008, May 2015.
    Google ScholarLocate open access versionFindings
  • J. Venna and S. Kaski, “Nonlinear dimensionality reduction as information retrieval,” in Artificial intelligence and statistics, pp. 572–579, 2007.
    Google ScholarLocate open access versionFindings
  • J. Venna, J. Peltonen, K. Nybo, H. Aidos, and S. Kaski, “Information retrieval perspective to nonlinear dimensionality reduction for data visualization,” Journal of Machine Learning Research, vol. 11, no. Feb, pp. 451–490, 2010.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton and S. T. Roweis, “Stochastic neighbor embedding,” in Advances in neural information processing systems, pp. 857–864, 2003.
    Google ScholarFindings
  • L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
    Google ScholarLocate open access versionFindings
  • Xin Geng, De-Chuan Zhan, and Zhi-Hua Zhou, “Supervised nonlinear dimensionality reduction for visualization and classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 35, pp. 1098–1107, Dec. 2005.
    Google ScholarLocate open access versionFindings
  • J. Peltonen, H. Aidos, and S. Kaski, “Supervised nonlinear dimensionality reduction by Neighbor Retrieval,” in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1809–1812, Apr. 2009.
    Google ScholarLocate open access versionFindings
  • L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” arXiv:1802.03426 [cs, stat], Feb. 2018. arXiv: 1802.03426.
    Findings
  • J. Venna and S. Kaski, “Neighborhood preservation in nonlinear projection methods: An experimental study,” in International Conference on Artificial Neural Networks, pp. 485–491, Springer, 2001.
    Google ScholarLocate open access versionFindings
  • K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, pp. 559–572, Nov. 1901.
    Google ScholarLocate open access versionFindings
  • T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
    Google ScholarLocate open access versionFindings
  • J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” science, vol. 290, no. 5500, pp. 2319–2323, 2000.
    Google ScholarLocate open access versionFindings
  • S. Lespinats, M. Verleysen, A. Giron, and B. Fertil, “DD-HDS: A method for visualization and exploration of high-dimensional data,” IEEE Transactions on Neural Networks, vol. 18, no. 5, pp. 1265–1279, 2007.
    Google ScholarLocate open access versionFindings
  • P. Joia, D. Coimbra, J. A. Cuminato, F. V. Paulovich, and L. G. Nonato, “Local Affine Multidimensional Projection,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, pp. 2563–2571, Dec. 2011.
    Google ScholarLocate open access versionFindings
  • J. A. Lee, E. Renard, G. Bernard, P. Dupont, and M. Verleysen, “Type 1 and 2 mixtures of Kullback–Leibler divergences as cost functions in dimensionality reduction based on similarity preservation,” Neurocomputing, vol. 112, pp. 92–108, July 2013.
    Google ScholarLocate open access versionFindings
  • O. Kouropteva, O. Okun, and M. Pietikäinen, “Supervised locally linear embedding algorithm for pattern recognition,” in Iberian Conference on Pattern Recognition and Image Analysis, pp. 386–394, Springer, 2003.
    Google ScholarLocate open access versionFindings
  • S.-q. Zhang, “Enhanced supervised locally linear embedding,” Pattern Recognition Letters, vol. 30, no. 13, pp. 1208–1218, 2009.
    Google ScholarLocate open access versionFindings
  • L. Zhao and Z. Zhang, “Supervised locally linear embedding with probability-based distance for classification,” Computers & Mathematics with Applications, vol. 57, no. 6, pp. 919–926, 2009.
    Google ScholarLocate open access versionFindings
  • C.-G. Li and J. Guo, “Supervised isomap with explicit mapping,” in First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC’06), vol. 3, pp. 345–348, IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • Z. Yang, I. King, Z. Xu, and E. Oja, “Heavy-tailed symmetric stochastic neighbor embedding,” in Advances in neural information processing systems, pp. 2169–2177, 2009.
    Google ScholarLocate open access versionFindings
  • R. A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
    Google ScholarLocate open access versionFindings
  • S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Mullers, “Fisher discriminant analysis with kernels,” in Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468), pp. 41–48, Ieee, 1999.
    Google ScholarLocate open access versionFindings
  • D. De Ridder, M. Loog, and M. J. Reinders, “Local fisher embedding,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol. 2, pp. 295–298, IEEE, 2004.
    Google ScholarLocate open access versionFindings
  • M. Sugiyama, “Local fisher discriminant analysis for supervised dimensionality reduction,” in Proceedings of the 23rd international conference on Machine learning, pp. 905–912, 2006.
    Google ScholarLocate open access versionFindings
  • J. Goldberger, G. E. Hinton, S. T. Roweis, and R. R. Salakhutdinov, “Neighbourhood Components Analysis,” in Advances in Neural Information Processing Systems 17 (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 513–520, MIT Press, 2005.
    Google ScholarLocate open access versionFindings
  • R. Salakhutdinov and G. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure,” in Artificial Intelligence and Statistics, pp. 412–419, 2007.
    Google ScholarLocate open access versionFindings
  • K. Bunte, P. Schneider, B. Hammer, F.-M. Schleif, T. Villmann, and M. Biehl, “Limited Rank Matrix Learning, discriminative dimension reduction and visualization,” Neural Networks, vol. 26, pp. 159–173, Feb. 2012.
    Google ScholarLocate open access versionFindings
  • C. de Bodt, D. Mulders, D. L. Sánchez, M. Verleysen, and J. A. Lee, “Class-aware t-SNE: cat-SNE.,” in ESANN, 2019.
    Google ScholarFindings
  • J. Venna and S. Kaski, “Local multidimensional scaling,” Neural Networks, vol. 19, pp. 889–899, July 2006.
    Google ScholarLocate open access versionFindings
  • C. C. Aggarwal, A. Hinneburg, and D. A. Keim, “On the surprising behavior of distance metrics in high dimensional spaces,” in Proceedings of the 8th International Conference on Database Theory, ICDT ’01, (Berlin, Heidelberg), p. 420–434, Springer-Verlag, 2001.
    Google ScholarLocate open access versionFindings
  • J. A. Lee and M. Verleysen, “Two key properties of dimensionality reduction methods,” in Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on, pp. 163–170, IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • M. Vladymyrov and M. A. Carreira-Perpinan, “Entropic Affinities: Properties and Efficient Numerical Computation.,” in ICML (3), pp. 477–485, 2013.
    Google ScholarFindings
  • J. A. Lee, D. H. Peluffo-Ordóñez, and M. Verleysen, “Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure,” Neurocomputing, vol. 169, pp. 246–261, Dec. 2015.
    Google ScholarLocate open access versionFindings
  • J. Nocedal and S. J. Wright, Numerical optimization. Springer series in operations research, New York: Springer, 1999.
    Google ScholarFindings
  • Z. Yang, J. Peltonen, and S. Kaski, “Scalable optimization of neighbor embedding for visualization,” in International Conference on Machine Learning, pp. 127–135, 2013.
    Google ScholarLocate open access versionFindings
  • L. Van Der Maaten, “Accelerating t-SNE using tree-based algorithms,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3221–3245, 2014.
    Google ScholarLocate open access versionFindings
  • J. Venna, Dimensionality reduction for visual exploration of similarity structures. PhD thesis, Helsinki University of Technology, Espoo, 2007. OCLC: 231147068.
    Google ScholarFindings
  • M. Fanty and R. Cole, “Spoken letter recognition,” in Advances in Neural Information Processing Systems, pp. 220–226, 1991.
    Google ScholarLocate open access versionFindings
  • D. Dua and E. Karra Taniskidou, “UCI machine learning repository,” 2017.
    Google ScholarFindings
  • E. Alpaydin and C. Kaynak, “Cascading classifiers,” Kybernetika, vol. 34, no. 4, pp. 369–374, 1998.
    Google ScholarLocate open access versionFindings
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
    Google ScholarLocate open access versionFindings
  • L. McInnes, J. Healy, N. Saul, and L. Grossberger, “Umap: Uniform manifold approximation and projection,” The Journal of Open Source Software, vol. 3, no. 29, p. 861, 2018.
    Google ScholarLocate open access versionFindings
  • B. Colange, “Classnerv.” https://doi.org/10.5281/zenodo.4094851, Oct.2020.
    Findings
  • F. Degret and S. Lespinats, “Circular background decreases misunderstanding of multidimensional scaling results for naive readers,” in MATEC Web of Conferences, vol. 189, p. 10002, EDP Sciences, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments