Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series.

Zachary Chase Lipton,David C. Kale,Randall C. Wetzel

MLHC（2016）

引用 32|浏览22

暂无评分

摘要

We demonstrate a simple strategy to cope with missing data in sequential inputs, addressing the task of multilabel classification of diagnoses given clinical time series. Collected from the intensive care unit (ICU) of a major urban medical center, our data consists of multivariate time series of observations. The data is irregularly sampled, leading to missingness patterns in re-sampled sequences. In this work, we show the remarkable ability of RNNs to make effective use of binary indicators to directly model missing data, improving AUC and F1 significantly. However, while RNNs can learn arbitrary functions of the missing data and observations, linear models can only learn substitution values. For linear models and MLPs, we show an alternative strategy to capture this signal. Additionally, we evaluate LSTMs, MLPs, and linear models trained on missingness patterns only, showing that for several diseases, what tests are run can be more predictive than the results themselves.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要