An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

Ali Ismail-Fawaz,Angus Dempster,Chang Wei Tan,Matthieu Herrmann,Lynn Miller,Daniel F. Schmidt,Stefano Berretti,Jonathan Weber,Maxime Devanne,Germain Forestier,Geoffrey I. Webb

CoRR（2023）

引用 0|浏览35

暂无评分

摘要

The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Dem\v{s}ar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available.

查看译文

关键词

multiple comparison benchmark evaluations,comparate set

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要