Dropping Incomplete Records is (not so) Straightforward.

Rianne Margaretha Schouten,Victoria Tascau, Gabriel G. Ziegler,Davide Casano,Marco Ardizzone,Michael-Angelos Erotokritou

IDA（2023）

引用 0|浏览0

暂无评分

摘要

A straightforward approach to handling missing values is dropping incomplete records from the dataset. However, for many forms of missingness, this method is known to affect the center and spread of the data distribution. In this paper, we perform an extensive empirical evaluation of the effect of the drop method on the data distribution. In particular, we analyze two scenarios that are likely to occur in practice but are not often considered in simulation studies: 1) when features are skewed rather than symmetrically distributed and 2) when multiple forms of missingness occur simultaneously in one feature. Furthermore, we investigate implications of the drop method for classification accuracy and demonstrate that dropping incomplete records is doubtful, even when test cases are dropped as well.

查看译文

关键词

incomplete records

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要