Classimap: A New Dimension Reduction Technique For Exploratory Data Analysis Of Labeled Data

International Journal of Pattern Recognition and Artificial Intelligence(2015)

引用 9|浏览17
暂无评分
摘要
Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.
更多
查看译文
关键词
Multidimensional scaling,exploratory data analysis,labeled data,mapping evaluation,dimensionality reduction,distance preservation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要