Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

Oliver Urs Lenz,Daniel Peralta,Chris Cornelis

arXiv (Cornell University)（2024）

引用 0|浏览8

暂无评分

摘要

We propose polar encoding, a representation of categorical and numerical [0, 1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the missingness incorporated in attributes (MIA) proposal. Furthermore, we show that categorical and [0, 1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies multiple imputation by chained equations (MICE) and multiple imputation with denoising autoencoders (MIDAS) and — depending on the classifier — about as well or better than mean/mode imputation with missing-indicators.

查看译文

关键词

barycentric coordinates,classification,decision trees,fuzzy partitions,missingness incorporated in attributes,missing values,nearest neighbours,one-hot encoding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要