Recurrent Neural Networks for Multivariate Time Series with Missing Values

Scientific reports, Volume abs/1606.01865, Issue 1, 2018, Pages 6085

Cited by: 804|Views88
EI WOS
Weibo:
Off-the-shelf Recurrent Neural Networks architectures with imputation can only achieve comparable performance to Random Forests and support vector machines, and they do not demonstrate the full advantage of representation learning

Abstract:

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missin...More

Code:

Data:

0
Introduction
  • The authors' model not only captures the long-term temporal dependencies of time series observations but utilizes the missing patterns to improve the prediction results.
  • These experiments show that the proposed method is suitable for many time series classification problems with missing data, and in particular is readily applicable to the predictive tasks in emerging health care applications.
  • These models are widely used in existing work[22,23,24] on applying RNN on health care time series data with missing values or irregular time stamps.
Highlights
  • Non-Recurrent Neural Networks Baselines (Non-Recurrent Neural Networks): We evaluate logistic regression (LR), support vector machines (SVM) and random forest (RF), which are widely used in health care applications
  • Our proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values. This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original Gated Recurrent Unit-Recurrent Neural Networks baselines
  • Off-the-shelf Recurrent Neural Networks architectures with imputation can only achieve comparable performance to Random Forests and support vector machines, and they do not demonstrate the full advantage of representation learning
  • To address the above issues, we propose a novel Gated Recurrent Unit-based model which captures the informative missingness by incorporating masking and time interval directly inside the Gated Recurrent Unit architecture
  • In our paper we focused on time-series data arising in intensive care units, we believe that our approaches will be widely useful for a variety of time-series prediction tasks arising in healthcare and beyond
Results
  • The authors regularly sample the time-series data to get a fixed length input and perform all baseline imputation methods to fill in the missing values.
  • While using simple imputation methods (Mean, Forward, Simple), all the prediction models except random forest show improved performance when they concatenate missingness indicators along with inputs.
  • To validate GRU-D model and demonstrate how it utilizes informative missing patterns, the authors take the PhysioNet mortality prediction as a study case, and show the input decay plots and hidden decay weight (Wγh) histograms for each input variable.
  • Since these RNN models only take statistical mean from the training examples or use forward imputation on the time series, no future information of the time series is used when the authors make predictions at each time step for time series in the test dataset.
  • GRU-D achieves similar prediction performance as best non-RNN baseline model with less time series data.
  • A series of work along the line of comparing and benchmarking the prediction performance of existing machine learning and deep learning models on MIMIC-III datasets have been conducted recently[44,45].
  • Similar to existing work[45] which compared results across different cohorts using logistic regression and gradient boosting trees, the authors use logistic regression, SVM, and random forest as baseline prediction models and show relative improvement of 2.2% AUROC score on MIMIC-III dataset from the proposed models over the best of these baselines.
  • The authors' proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values.
Conclusion
  • This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original GRU-RNN baselines.
  • The authors' proposed GRU-D model with trainable decays has similar running time and space complexity to original RNN models, and are shown to provide promising performance and pull significantly ahead of non-deep learning methods on synthetic and real-world healthcare datasets.
  • The authors will explore deep learning approaches to characterize missing-not-at-random data and the authors will conduct theoretical analysis to understand the behaviors of existing solutions for missing values
Summary
  • The authors' model not only captures the long-term temporal dependencies of time series observations but utilizes the missing patterns to improve the prediction results.
  • These experiments show that the proposed method is suitable for many time series classification problems with missing data, and in particular is readily applicable to the predictive tasks in emerging health care applications.
  • These models are widely used in existing work[22,23,24] on applying RNN on health care time series data with missing values or irregular time stamps.
  • The authors regularly sample the time-series data to get a fixed length input and perform all baseline imputation methods to fill in the missing values.
  • While using simple imputation methods (Mean, Forward, Simple), all the prediction models except random forest show improved performance when they concatenate missingness indicators along with inputs.
  • To validate GRU-D model and demonstrate how it utilizes informative missing patterns, the authors take the PhysioNet mortality prediction as a study case, and show the input decay plots and hidden decay weight (Wγh) histograms for each input variable.
  • Since these RNN models only take statistical mean from the training examples or use forward imputation on the time series, no future information of the time series is used when the authors make predictions at each time step for time series in the test dataset.
  • GRU-D achieves similar prediction performance as best non-RNN baseline model with less time series data.
  • A series of work along the line of comparing and benchmarking the prediction performance of existing machine learning and deep learning models on MIMIC-III datasets have been conducted recently[44,45].
  • Similar to existing work[45] which compared results across different cohorts using logistic regression and gradient boosting trees, the authors use logistic regression, SVM, and random forest as baseline prediction models and show relative improvement of 2.2% AUROC score on MIMIC-III dataset from the proposed models over the best of these baselines.
  • The authors' proposed model focused on the goal of making accurate and robust predictions on multi-variate time series data with missing values.
  • This model relies on the information related to the prediction tasks, which is represented in the missing patterns, to improve the prediction performance over the original GRU-RNN baselines.
  • The authors' proposed GRU-D model with trainable decays has similar running time and space complexity to original RNN models, and are shown to provide promising performance and pull significantly ahead of non-deep learning methods on synthetic and real-world healthcare datasets.
  • The authors will explore deep learning approaches to characterize missing-not-at-random data and the authors will conduct theoretical analysis to understand the behaviors of existing solutions for missing values
Tables
  • Table1: Model performances measured by AUC score (mean ± std) for mortality prediction
  • Table2: Model performances measured by average AUC score (mean ± std) for multi-task predictions on real datasets
Download tables as Excel
Funding
  • Develops novel deep learning models, namely GRU-D, as one of the early attempts
  • Experiments of time series classification tasks on real-world clinical datasets and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis
  • Shows some examples from MIMIC-III2, a real world health care dataset, in Fig
  • Develops a novel deep learning model based on GRU, namely GRU-D, to effectively exploit two representations of informative missingness patterns, i.e., masking and time interval
  • Introduces a masking vector mt ∈ {0, 1}D to denote which variables are missing at time step t, and maintain the time interval δtd ∈ for each variable d since its last observation
Reference
  • Rubin, D. B. Inference and missing data. Biom. 63, 581–592 (1976).
    Google ScholarLocate open access versionFindings
  • Johnson, A. et al. Mimic-iii, a freely accessible critical care database. Sci. Data (2016).
    Google ScholarLocate open access versionFindings
  • Schafer, J. L. & Graham, J. W. Missing data: our view of the state of the art. Psychol. methods (2002).
    Google ScholarLocate open access versionFindings
  • Kreindler, D. M. & Lumsden, C. J. The effects of the irregular sample and missing data in time series analysis. Nonlinear Dyn. Syst. Analysis for Behav. Sci. Using Real Data (2012).
    Google ScholarLocate open access versionFindings
  • De Boor, C., De Boor, C., Mathématicien, E.-U., De Boor, C. & De Boor, C. A practical guide to splines 27 (Springer-Verlag, New York, 1978).
    Google ScholarFindings
  • Mondal, D. & Percival, D. B. Wavelet variance analysis for gappy time series. Annals Inst. Stat. Math. 62, 943–966 (2010).
    Google ScholarLocate open access versionFindings
  • Rehfeld, K., Marwan, N., Heitzig, J. & Kurths, J. Comparison of correlation analysis techniques for irregularly sampled time series. Nonlinear Process. Geophys. 18 (2011).
    Google ScholarLocate open access versionFindings
  • Garca-Laencina, P. J., Sancho-Gómez, J.-L. & Figueiras-Vidal, A. R. Pattern classification with missing data: a review. Neural Comput. Appl. 19 (2010).
    Google ScholarLocate open access versionFindings
  • Mazumder, R., Hastie, T. & Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. machine learning research 11, 2287–2322 (2010).
    Google ScholarLocate open access versionFindings
  • Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques for recommender systems. Comput. 42 (2009).
    Google ScholarLocate open access versionFindings
  • White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: issues and guidance for practice. Stat. medicine 30, 377–399 (2011).
    Google ScholarLocate open access versionFindings
  • Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: what is it and how does it work? Int. journal methods psychiatric research 20, 40–49 (2011).
    Google ScholarLocate open access versionFindings
  • Wells, B. J., Chagin, K. M., Nowacki, A. S. & Kattan, M. W. Strategies for handling missing data in electronic health record derived data. EGEMS 1 (2013).
    Google ScholarLocate open access versionFindings
  • Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9, 1735–1780 (1997).
    Google ScholarLocate open access versionFindings
  • Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 1724–1734 (2014).
    Google ScholarLocate open access versionFindings
  • Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. ICLR (2015).
    Google ScholarLocate open access versionFindings
  • Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 3104–3112 (2014).
    Google ScholarFindings
  • Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Process. Mag. IEEE 29, 82–97 (2012).
    Google ScholarLocate open access versionFindings
  • Bengio, Y. & Gingras, F. Recurrent neural networks for missing or asynchronous data. Adv. neural information processing systems 395–401 (1996).
    Google ScholarFindings
  • Tresp, V. & Briegel, T. A solution for missing data in recurrent neural networks with an application to blood glucose prediction. NIPS
    Google ScholarLocate open access versionFindings
  • Parveen, S. & Green, P. Speech recognition with missing data using recurrent neural nets. In Advances in Neural Information Processing Systems, 1189–1195 (2001).
    Google ScholarLocate open access versionFindings
  • Lipton, Z. C., Kale, D. & Wetzel, R. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In Machine Learning for Healthcare Conference, 253–270 (2016).
    Google ScholarLocate open access versionFindings
  • Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F. & Sun, J. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference, 301–318 (2016).
    Google ScholarLocate open access versionFindings
  • Pham, T., Tran, T., Phung, D. & Venkatesh, S. Deepcare: A deep dynamic memory model for predictive medicine. In Advances in Knowledge Discovery and Data Mining, 30–41 (2016).
    Google ScholarLocate open access versionFindings
  • Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. arXiv preprint arXiv:1606.01865 (2016).
    Findings
  • Vodovotz, Y., An, G. & Androulakis, I. P. A systems engineering perspective on homeostasis and disease. Front. bioengineering biotechnology 1 (2013).
    Google ScholarFindings
  • Zhou, L. & Hripcsak, G. Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J. biomedical informatics 40, 183–202 (2007).
    Google ScholarLocate open access versionFindings
  • Batista, G. E. & Monard, M. C. et al. A study of k-nearest neighbour as an imputation method. HIS 87, 48 (2002).
    Google ScholarLocate open access versionFindings
  • Josse, J. & Husson, F. Handling missing values in exploratory multivariate data analysis methods. J. de la Société Française de Stat. 153, 79–99 (2012).
    Google ScholarLocate open access versionFindings
  • Stekhoven, D. J. & Bühlmann, P. Missforest—non-parametric missing value imputation for mixed-type data. Bioinforma. 28, 112–118 (2011).
    Google ScholarLocate open access versionFindings
  • Alex Rubinsteyn, S. F. fancyimpute. https://github.com/hammerlab/fancyimpute (2015).
    Findings
  • English, P. predictive_imputer. https://github.com/log0ymxm/predictive_imputer (2016).
    Findings
  • Jones, E., Oliphant, T. & Peterson, P. Scipy: Open source scientific tools for python. http://www.scipy.org/ (2001).
    Findings
  • Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
    Google ScholarLocate open access versionFindings
  • Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, 448–456 (2015).
    Google ScholarLocate open access versionFindings
  • Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15 (2014).
    Google ScholarLocate open access versionFindings
  • Kingma, D. & Ba, J. Adam: A method for stochastic optimization. ICLR (2015).
    Google ScholarLocate open access versionFindings
  • Chollet, F. et al. Keras. https://github.com/keras-team/keras (2015).
    Locate open access versionFindings
  • Bergstra, J. et al. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy) (2010).
    Google ScholarLocate open access versionFindings
  • Madeo, R. C., Lima, C. A. & Peres, S. M. Gesture unit segmentation using support vector machines: segmenting gestures from rest positions. In SAC (2013).
    Google ScholarLocate open access versionFindings
  • Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In CinC (2012).
    Google ScholarLocate open access versionFindings
  • Gal, Y. & Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, 1019–1027 (2016).
    Google ScholarLocate open access versionFindings
  • Che, Z., Kale, D., Li, W., Bahadori, M. T. & Liu, Y. Deep computational phenotyping. In SIGKDD (2015).
    Google ScholarLocate open access versionFindings
  • Purushotham, S., Meng, C., Che, Z. & Liu, Y. Benchmark of deep learning models on large healthcare mimic datasets. arXiv preprint arXiv:1710.08531 (2017).
    Findings
  • Johnson, A. E., Pollard, T. J. & Mark, R. G. Reproducibility in critical care: a mortality prediction case study. In Machine Learning for Healthcare Conference, 361–376 (2017).
    Google ScholarLocate open access versionFindings
  • Luo, Y.-F. & Rumshisky, A. Interpretable topic features for post-icu mortality prediction. In AMIA Annual Symposium Proceedings, 827 (2016). Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-24271-9.
    Locate open access versionFindings
Your rating :
0

 

Tags
Comments