Handling Missing Data in Decision Trees: A Probabilistic Approach

arxiv(2020)

引用 19|浏览37
暂无评分
摘要
Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction" of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss" w.r.t. our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.
更多
查看译文
关键词
decision trees,missing data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要