# Recurrent Neural Networks for Multivariate Time Series with Missing Values

Scientific reports, Volume abs/1606.01865, Issue 1, 2018, Pages 6085

EI WOS

Weibo:

Abstract:

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missin...More

Code:

Data:

Full Text

Weibo

Introduction

- The authors' model not only captures the long-term temporal dependencies of time series observations but utilizes the missing patterns to improve the prediction results.
- These experiments show that the proposed method is suitable for many time series classification problems with missing data, and in particular is readily applicable to the predictive tasks in emerging health care applications.
- These models are widely used in existing work[22,23,24] on applying RNN on health care time series data with missing values or irregular time stamps.

Highlights

- Non-Recurrent Neural Networks Baselines (Non-Recurrent Neural Networks): We evaluate logistic regression (LR), support vector machines (SVM) and random forest (RF), which are widely used in health care applications
- Our proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values. This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original Gated Recurrent Unit-Recurrent Neural Networks baselines
- Off-the-shelf Recurrent Neural Networks architectures with imputation can only achieve comparable performance to Random Forests and support vector machines, and they do not demonstrate the full advantage of representation learning
- To address the above issues, we propose a novel Gated Recurrent Unit-based model which captures the informative missingness by incorporating masking and time interval directly inside the Gated Recurrent Unit architecture
- In our paper we focused on time-series data arising in intensive care units, we believe that our approaches will be widely useful for a variety of time-series prediction tasks arising in healthcare and beyond

Results

- The authors regularly sample the time-series data to get a fixed length input and perform all baseline imputation methods to fill in the missing values.
- While using simple imputation methods (Mean, Forward, Simple), all the prediction models except random forest show improved performance when they concatenate missingness indicators along with inputs.
- To validate GRU-D model and demonstrate how it utilizes informative missing patterns, the authors take the PhysioNet mortality prediction as a study case, and show the input decay plots and hidden decay weight (Wγh) histograms for each input variable.
- Since these RNN models only take statistical mean from the training examples or use forward imputation on the time series, no future information of the time series is used when the authors make predictions at each time step for time series in the test dataset.
- GRU-D achieves similar prediction performance as best non-RNN baseline model with less time series data.
- A series of work along the line of comparing and benchmarking the prediction performance of existing machine learning and deep learning models on MIMIC-III datasets have been conducted recently[44,45].
- Similar to existing work[45] which compared results across different cohorts using logistic regression and gradient boosting trees, the authors use logistic regression, SVM, and random forest as baseline prediction models and show relative improvement of 2.2% AUROC score on MIMIC-III dataset from the proposed models over the best of these baselines.
- The authors' proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values.

Conclusion

- This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original GRU-RNN baselines.
- The authors' proposed GRU-D model with trainable decays has similar running time and space complexity to original RNN models, and are shown to provide promising performance and pull significantly ahead of non-deep learning methods on synthetic and real-world healthcare datasets.
- The authors will explore deep learning approaches to characterize missing-not-at-random data and the authors will conduct theoretical analysis to understand the behaviors of existing solutions for missing values

Summary

- The authors' model not only captures the long-term temporal dependencies of time series observations but utilizes the missing patterns to improve the prediction results.
- These experiments show that the proposed method is suitable for many time series classification problems with missing data, and in particular is readily applicable to the predictive tasks in emerging health care applications.
- These models are widely used in existing work[22,23,24] on applying RNN on health care time series data with missing values or irregular time stamps.
- The authors regularly sample the time-series data to get a fixed length input and perform all baseline imputation methods to fill in the missing values.
- While using simple imputation methods (Mean, Forward, Simple), all the prediction models except random forest show improved performance when they concatenate missingness indicators along with inputs.
- To validate GRU-D model and demonstrate how it utilizes informative missing patterns, the authors take the PhysioNet mortality prediction as a study case, and show the input decay plots and hidden decay weight (Wγh) histograms for each input variable.
- Since these RNN models only take statistical mean from the training examples or use forward imputation on the time series, no future information of the time series is used when the authors make predictions at each time step for time series in the test dataset.
- GRU-D achieves similar prediction performance as best non-RNN baseline model with less time series data.
- A series of work along the line of comparing and benchmarking the prediction performance of existing machine learning and deep learning models on MIMIC-III datasets have been conducted recently[44,45].
- Similar to existing work[45] which compared results across different cohorts using logistic regression and gradient boosting trees, the authors use logistic regression, SVM, and random forest as baseline prediction models and show relative improvement of 2.2% AUROC score on MIMIC-III dataset from the proposed models over the best of these baselines.
- The authors' proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values.
- This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original GRU-RNN baselines.
- The authors' proposed GRU-D model with trainable decays has similar running time and space complexity to original RNN models, and are shown to provide promising performance and pull significantly ahead of non-deep learning methods on synthetic and real-world healthcare datasets.
- The authors will explore deep learning approaches to characterize missing-not-at-random data and the authors will conduct theoretical analysis to understand the behaviors of existing solutions for missing values

- Table1: Model performances measured by AUC score (mean ± std) for mortality prediction
- Table2: Model performances measured by average AUC score (mean ± std) for multi-task predictions on real datasets

Funding

- Develops novel deep learning models, namely GRU-D, as one of the early attempts
- Experiments of time series classification tasks on real-world clinical datasets and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis
- Shows some examples from MIMIC-III2, a real world health care dataset, in Fig
- Develops a novel deep learning model based on GRU, namely GRU-D, to effectively exploit two representations of informative missingness patterns, i.e., masking and time interval
- Introduces a masking vector mt ∈ {0, 1}D to denote which variables are missing at time step t, and maintain the time interval δtd ∈ for each variable d since its last observation

Reference

- Rubin, D. B. Inference and missing data. Biom. 63, 581–592 (1976).
- Johnson, A. et al. Mimic-iii, a freely accessible critical care database. Sci. Data (2016).
- Schafer, J. L. & Graham, J. W. Missing data: our view of the state of the art. Psychol. methods (2002).
- Kreindler, D. M. & Lumsden, C. J. The effects of the irregular sample and missing data in time series analysis. Nonlinear Dyn. Syst. Analysis for Behav. Sci. Using Real Data (2012).
- De Boor, C., De Boor, C., Mathématicien, E.-U., De Boor, C. & De Boor, C. A practical guide to splines 27 (Springer-Verlag, New York, 1978).
- Mondal, D. & Percival, D. B. Wavelet variance analysis for gappy time series. Annals Inst. Stat. Math. 62, 943–966 (2010).
- Rehfeld, K., Marwan, N., Heitzig, J. & Kurths, J. Comparison of correlation analysis techniques for irregularly sampled time series. Nonlinear Process. Geophys. 18 (2011).
- Garca-Laencina, P. J., Sancho-Gómez, J.-L. & Figueiras-Vidal, A. R. Pattern classification with missing data: a review. Neural Comput. Appl. 19 (2010).
- Mazumder, R., Hastie, T. & Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. machine learning research 11, 2287–2322 (2010).
- Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques for recommender systems. Comput. 42 (2009).
- White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: issues and guidance for practice. Stat. medicine 30, 377–399 (2011).
- Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: what is it and how does it work? Int. journal methods psychiatric research 20, 40–49 (2011).
- Wells, B. J., Chagin, K. M., Nowacki, A. S. & Kattan, M. W. Strategies for handling missing data in electronic health record derived data. EGEMS 1 (2013).
- Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9, 1735–1780 (1997).
- Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 1724–1734 (2014).
- Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. ICLR (2015).
- Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 3104–3112 (2014).
- Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Process. Mag. IEEE 29, 82–97 (2012).
- Bengio, Y. & Gingras, F. Recurrent neural networks for missing or asynchronous data. Adv. neural information processing systems 395–401 (1996).
- Tresp, V. & Briegel, T. A solution for missing data in recurrent neural networks with an application to blood glucose prediction. NIPS
- Parveen, S. & Green, P. Speech recognition with missing data using recurrent neural nets. In Advances in Neural Information Processing Systems, 1189–1195 (2001).
- Lipton, Z. C., Kale, D. & Wetzel, R. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In Machine Learning for Healthcare Conference, 253–270 (2016).
- Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F. & Sun, J. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, 301–318 (2016).
- Pham, T., Tran, T., Phung, D. & Venkatesh, S. Deepcare: A deep dynamic memory model for predictive medicine. In Advances in Knowledge Discovery and Data Mining, 30–41 (2016).
- Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. arXiv preprint arXiv:1606.01865 (2016).
- Vodovotz, Y., An, G. & Androulakis, I. P. A systems engineering perspective on homeostasis and disease. Front. bioengineering biotechnology 1 (2013).
- Zhou, L. & Hripcsak, G. Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J. biomedical informatics 40, 183–202 (2007).
- Batista, G. E. & Monard, M. C. et al. A study of k-nearest neighbour as an imputation method. HIS 87, 48 (2002).
- Josse, J. & Husson, F. Handling missing values in exploratory multivariate data analysis methods. J. de la Société Française de Stat. 153, 79–99 (2012).
- Stekhoven, D. J. & Bühlmann, P. Missforest—non-parametric missing value imputation for mixed-type data. Bioinforma. 28, 112–118 (2011).
- Alex Rubinsteyn, S. F. fancyimpute. https://github.com/hammerlab/fancyimpute (2015).
- English, P. predictive_imputer. https://github.com/log0ymxm/predictive_imputer (2016).
- Jones, E., Oliphant, T. & Peterson, P. Scipy: Open source scientific tools for python. http://www.scipy.org/ (2001).
- Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
- Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015).
- Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15 (2014).
- Kingma, D. & Ba, J. Adam: A method for stochastic optimization. ICLR (2015).
- Chollet, F. et al. Keras. https://github.com/keras-team/keras (2015).
- Bergstra, J. et al. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy) (2010).
- Madeo, R. C., Lima, C. A. & Peres, S. M. Gesture unit segmentation using support vector machines: segmenting gestures from rest positions. In SAC (2013).
- Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In CinC (2012).
- Gal, Y. & Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, 1019–1027 (2016).
- Che, Z., Kale, D., Li, W., Bahadori, M. T. & Liu, Y. Deep computational phenotyping. In SIGKDD (2015).
- Purushotham, S., Meng, C., Che, Z. & Liu, Y. Benchmark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017).
- Johnson, A. E., Pollard, T. J. & Mark, R. G. Reproducibility in critical care: a mortality prediction case study. In Machine Learning for Healthcare Conference, 361–376 (2017).
- Luo, Y.-F. & Rumshisky, A. Interpretable topic features for post-icu mortality prediction. In AMIA Annual Symposium Proceedings, 827 (2016). Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-24271-9.

Tags

Comments