Better to be in agreement than in bad company: a critical analysis of many kappa-like tests assessing one-million 2x2 contingency tables

arXiv (Cornell University)(2022)

引用 0|浏览0
暂无评分
摘要
We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dicotomization by the conditions of the subjects (e.g., male or female) or by conveniency of the classification (e.g., traditional thresholds leading to separations in healthy or diseased, exposed or non-exposed, etc.). More extreme table configurations (e.g., high agreement between raters) are also usual, but some of the coefficients have problems with imbalanced tables. Here, we not only studied some especific estimators, but also developed a general method to the study for any estimator candidate to be an agreement measurement. This method was developed in open source R codes and it is avaliable to the researchers. Here, we tested this method by verifying the performance of several traditional estimators over all 1,028,789 tables with size ranging from 1 to 68. Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi has ambiguity to assess situations of agreement between raters. Shankar and Bangdiwala's B was mistaken in all situations of neutrality and when there is greater disagreement between raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Holley and Guilford's G is the best agreement estimator, closely followed by Gwet's AC1 and they should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from https://sourceforge.net/projects/tables2x2.
更多
查看译文
关键词
bad company,tests,kappa-like,one-million
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要