Suboptimal Comparison of Partitions

JOURNAL OF CLASSIFICATION(2019)

引用 1|浏览11
暂无评分
摘要
The distinction between classification and clustering is often based on a priori knowledge of classification labels. However, in the purely theoretical situation where a data-generating model is known, the optimal solutions for clustering do not necessarily correspond to optimal solutions for classification. Exploring this divergence leads us to conclude that no standard measures of either internal or external validation can guarantee a correspondence with optimal clustering performance. We provide recommendations for the suboptimal evaluation of clustering performance. Such suboptimal approaches can provide valuable insight to researchers hoping to add a post hoc interpretation to their clusters. Indices based on pairwise linkage provide the clearest probabilistic interpretation, while a triplet-based index yields information on higher level structures in the data. Finally, a graphical examination of receiver operating characteristics generated from hierarchical clustering dendrograms can convey information that would be lost in any one number summary.
更多
查看译文
关键词
Classification,Clustering,Sensitivity,Specificity,Triplet index,Hierarchical receiver operating characteristic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要