# Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values

national conference on artificial intelligence, 2020.

Weibo:

Abstract:

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural...More

Code:

Data:

Introduction

- Multivariate time series (MTS) forecasting is widely used in many applications such as weather forecasting (Xingjian et al 2015), clinical diagnosis (Che et al 2018), sales forecasting (Wu et al 2018; Wu et al 2019) and traffic analysis (Yao et al 2019b; Yao et al 2018; Yao et al 2019a; Tang et al 2019).
- Modeling local and global temporal dynamics is very promising for MTS forecasting with missing values.
- The authors study a new problem of MTS forecasting with missing values by exploring local and global temporal dynamics.

Highlights

- Multivariate time series (MTS) forecasting is widely used in many applications such as weather forecasting (Xingjian et al 2015), clinical diagnosis (Che et al 2018), sales forecasting (Wu et al 2018; Wu et al 2019) and traffic analysis (Yao et al 2019b; Yao et al 2018; Yao et al 2019a; Tang et al 2019)
- Recurrent neural networks (RNNs), a class of deep learning frameworks designed for modeling sequential data, have been successfully applied to this problem
- We report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values
- We investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values
- We propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys
- To make the generated MTS more realistic, we further adopt adversarial training to enhance the modeling of global temporal data distribution

Results

- The authors propose a novel framework LGnet, with a memory module to capture global temporal dynamics for missing values and adversarial training to enhances the modeling of global temporal distribution.
- The authors first extract local statistic features for every time interval, use them as keys to query a memory component, which is jointly optimized with LSTM on all MTS data.
- For each variable in a MTS, the authors first capture informative statistics from the local context of this time series, leverage local statistics as keys to query the memory component, which returns representation vectors with global temporal dynamics.
- J=1 where θ are parameters of LGnet, including parameters of the LSTM and the memory component, Mjp is the mask matrix of the j-th MTS data sample Xj over the predicted variables, and is dot-production.
- Linear Regression (LR): Because conventional linear regression model cannot directly handle missing values, the authors concatenate each MTS with its mask matrix as the input features to train LR for the forecasting task.
- The authors report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values.
- The memory module explores global temporal dynamics and generates appropriate estimations for missing values; (ii) when k increases, i.e., when forecasting far future values, the performance of all the methods decreases, which is reasonable because it’s more difficult to forecast far future values than near ones.
- LGnet significantly out-performs LGnetadv, indicating that modeling global temporal dynamics with the memory module benefits the forecasting.
- This is because the original MTS forecasting objective is less efficient with a high missing ratio, as it only relies on observed parts of the time series.
- The authors investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values.

Conclusion

- The authors propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys.
- To make the generated MTS more realistic, the authors further adopt adversarial training to enhance the modeling of global temporal data distribution.
- Experimental results on four large-scale real-world datasets show the efficacy of LGnet

Summary

- Multivariate time series (MTS) forecasting is widely used in many applications such as weather forecasting (Xingjian et al 2015), clinical diagnosis (Che et al 2018), sales forecasting (Wu et al 2018; Wu et al 2019) and traffic analysis (Yao et al 2019b; Yao et al 2018; Yao et al 2019a; Tang et al 2019).
- Modeling local and global temporal dynamics is very promising for MTS forecasting with missing values.
- The authors study a new problem of MTS forecasting with missing values by exploring local and global temporal dynamics.
- The authors propose a novel framework LGnet, with a memory module to capture global temporal dynamics for missing values and adversarial training to enhances the modeling of global temporal distribution.
- The authors first extract local statistic features for every time interval, use them as keys to query a memory component, which is jointly optimized with LSTM on all MTS data.
- For each variable in a MTS, the authors first capture informative statistics from the local context of this time series, leverage local statistics as keys to query the memory component, which returns representation vectors with global temporal dynamics.
- J=1 where θ are parameters of LGnet, including parameters of the LSTM and the memory component, Mjp is the mask matrix of the j-th MTS data sample Xj over the predicted variables, and is dot-production.
- Linear Regression (LR): Because conventional linear regression model cannot directly handle missing values, the authors concatenate each MTS with its mask matrix as the input features to train LR for the forecasting task.
- The authors report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values.
- The memory module explores global temporal dynamics and generates appropriate estimations for missing values; (ii) when k increases, i.e., when forecasting far future values, the performance of all the methods decreases, which is reasonable because it’s more difficult to forecast far future values than near ones.
- LGnet significantly out-performs LGnetadv, indicating that modeling global temporal dynamics with the memory module benefits the forecasting.
- This is because the original MTS forecasting objective is less efficient with a high missing ratio, as it only relies on observed parts of the time series.
- The authors investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values.
- The authors propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys.
- To make the generated MTS more realistic, the authors further adopt adversarial training to enhance the modeling of global temporal data distribution.
- Experimental results on four large-scale real-world datasets show the efficacy of LGnet

- Table1: MTS forecasting performances on Beijing Air and PhysioNet
- Table2: MTS forecasting performance of variants
- Table3: Analysis of hyper-parameter λ

Related work

- Various methods have been proposed for MTS forecasting, such as Autoregressive (AR), Vector Autoregression (VAR), Autoregressive moving average (ARMA), standard regression models (e.g., support vector regression (Smola and Scholkopf 2004), linear regression, and regression tree methods (Chen and Guestrin 2016)). Inspired by the recent success of deep neural networks, many RNN-based methods (Lai et al 2018; Qin et al 2017) are developed for MTS forecasting. Even some vanilla RNNs, such as GRU (Chung et al 2014) and LSTM (Hochreiter and Schmidhuber 1997), can outperform the non deep learning models significantly (Chang et al 2018). However, none of those approaches can handle input with missing values.

To handle missing values in MTS, the simplest solution would be removing all samples with missing values, such as pairwise deletion (Marsh 1998). Obviously, such methods ignore many useful information, especially with a high missing ratio (King et al 1998). General data imputation methods such as statistical imputation (e.g., mean, median), EMbased imputation (Nelwamondo, Mohamed, and Marwala 2007), K-nearest neighborhood (Friedman, Hastie, and Tibshirani 2001), and matrix factorization (Friedman, Hastie, and Tibshirani 2001) can be applied for the unobserved variables. However, those general approaches fail to model temporal dynamics of time series. Even if MTS imputation methods, such as multivariate imputation by chained equations (Azur et al 2011) and generative adversarial network Luo et al, can be applied to fill in missing values first, training a forecasting model on pre-processed MTS data would lead to sub-optimal results, since the temporal patterns of missing values are totally isolated from forecasting models (Wells et al 2013). To tackle this issue, some researchers propose end-to-end frameworks that jointly estimate missing values and forecast future MTS. Che et al introduce GRUD that imputes missing values using the linear combination of statistical features. Yoon, Zame, and van der Schaar propose M-RNN that leverages bi-directional RNN for the imputation. Cao et al model the relationships between missing variables to simultaneously perform imputation and classification/regression in one neural graph. However, those solutions focus on localized temporal dependencies and fail to model global temporal dynamics.

Funding

- This material is based upon work supported by, or in part by, the National Science Foundation (NSF) under grant #1909702

Reference

- [Arjovsky, Chintala, and Bottou 2017] Arjovsky, M.; Chintala, S.; and Bottou, L. 2017. Wasserstein gan. arXiv:1701.07875.
- [Azur et al. 2011] Azur, M. J.; Stuart, E. A.; Frangakis, C.; and Leaf, P. J. 2011. Multiple imputation by chained equations: what is it and how does it work? International journal of methods in psychiatric research 20(1):40–49.
- [Bengio et al. 2015] Bengio, S.; Vinyals, O.; Jaitly, N.; and Shazeer, N. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NeurIPS, 1171–1179.
- [Box et al. 2015] Box, G. E.; Jenkins, G. M.; Reinsel, G. C.; and Ljung, G. M. 2015. Time series analysis: forecasting and control. John Wiley & Sons.
- [Cao et al. 2018] Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; and Li, Y. 2018. Brits: Bidirectional recurrent imputation for time series. arXiv:1805.10572.
- [Chang et al. 2018] Chang, Y.-Y.; Sun, F.-Y.; Wu, Y.-H.; and Lin, S.-D. 2018. A memory-network based solution for multivariate time-series forecasting. arXiv:1809.02105.
- [Che et al. 2018] Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; and Liu, Y. 2018. Recurrent neural networks for multivariate time series with missing values. Scientific reports 8(1):6085.
- [Chen and Guestrin 2016] Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In KDD, 785–794. ACM.
- [Cho et al. 2014] Cho, K.; Van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078.
- [Chung et al. 2014] Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555.
- [Friedman, Hastie, and Tibshirani 2001] Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of statistical learning, volume 1. Springer series in statistics New York, NY, USA:.
- [Garcıa-Laencina, Sancho-Gomez, and Figueiras-Vidal 2010] Garcıa-Laencina, P. J.; Sancho-Gomez, J.-L.; and FigueirasVidal, A. R. 2010. Pattern classification with missing data: a review. Neural Computing and Applications 19(2):263–282.
- [Goodfellow et al. 2014] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In NeurIPS, 2672–2680.
- [Hochreiter and Schmidhuber 1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
- [King et al. 1998] King, G.; Honaker, J.; Joseph, A.; and Scheve, K. 1998. List-wise deletion is evil: what to do about missing data in political science. In APSA.
- [Kumar et al. 2016] Kumar, A.; Irsoy, O.; Ondruska, P.; Iyyer, M.; Bradbury, J.; Gulrajani, I.; Zhong, V.; Paulus, R.; and Socher, R. 20Ask me anything: Dynamic memory networks for natural language processing. In ICML, 1378–1387.
- [Lai et al. 2018] Lai, G.; Chang, W.-C.; Yang, Y.; and Liu, H. 2018. Modeling long-and short-term temporal patterns with deep neural networks. In SIGIR. ACM.
- [Luo et al. 2018] Luo, Y.; Cai, X.; Zhang, Y.; Xu, J.; et al. 20Multivariate time series imputation with generative adversarial networks. In NeurIPS, 1603–1614.
- [Marsh 1998] Marsh, H. W. 1998. Pairwise deletion for missing data in structural equation models: Nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural Equation Modeling: A Multidisciplinary Journal 5(1).
- [Nelwamondo, Mohamed, and Marwala 2007] Nelwamondo, F. V.; Mohamed, S.; and Marwala, T. 2007. Missing data: A comparison of neural network and expectation maximization techniques. Current Science.
- [Qin et al. 2017] Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; and Cottrell, G. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv:1704.02971.
- [Shu et al. 2018] Shu, K.; Wang, S.; Le, T.; Lee, D.; and Liu, H. 2018. Deep headline generation for clickbait detection. In ICDM. IEEE.
- [Silva et al. 2012] Silva, I.; Moody, G.; Scott, D. J.; Celi, L. A.; and Mark, R. G. 2012. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. Computing in cardiology 39:245.
- [Smola and Scholkopf 2004] Smola, A. J., and Scholkopf, B. 2004. A tutorial on support vector regression. Statistics and computing 14(3):199–222.
- [Sukhbaatar et al. 2015] Sukhbaatar, S.; Weston, J.; Fergus, R.; et al. 2015. End-to-end memory networks. In NeurIPS, 2440– 2448.
- [Sun et al. 2019] Sun, Y.; Wang, S.; Hsieh, T.-Y.; Tang, X.; and Honavar, V. 2019. Megan: a generative adversarial network for multi-view network embedding. arXiv preprint arXiv:1909.01084.
- [Tang et al. 2017] Tang, J.; Wang, Y.; Zheng, K.; and Mei, Q. 2017. End-to-end learning for short text expansion. In KDD, 1105–1113. ACM.
- [Tang et al. 2019] Tang, X.; Gong, B.; Yu, Y.; Yao, H.; Li, Y.; Xie, H.; and Wang, X. 2019. Joint modeling of dense and incomplete trajectories for citywide traffic volume inference. In WWW. ACM.
- [Wells et al. 2013] Wells, B. J.; Chagin, K. M.; Nowacki, A. S.; and Kattan, M. W. 2013. Strategies for handling missing data in electronic health record derived data. Egems 1(3).
- [Weston, Chopra, and Bordes 2014] Weston, J.; Chopra, S.; and Bordes, A. 2014. Memory networks. arXiv:1410.3916.
- [Wu et al. 2018] Wu, X.; Shi, B.; Dong, Y.; Huang, C.; Faust, L.; and Chawla, N. V. 2018. Restful: Resolution-aware forecasting of behavioral time series data. In CIKM, 1073–1082. ACM.
- [Wu et al. 2019] Wu, X.; Shi, B.; Dong, Y.; Huang, C.; and Chawla, N. V. 2019. Neural tensor factorization for temporal interaction learning. In WSDM, 537–545. ACM.
- [Xingjian et al. 2015] Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; and Woo, W.-c. 2015. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In NeurIPS, 802–810.
- [Yao et al. 2018] Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; and Li, Z. 2018. Deep multi-view spatialtemporal network for taxi demand prediction. In AAAI.
- [Yao et al. 2019a] Yao, H.; Liu, Y.; Wei, Y.; Tang, X.; and Li, Z. 2019a. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction. WWW.
- [Yao et al. 2019b] Yao, H.; Tang, X.; Wei, H.; Zheng, G.; and Li, Z. 2019b. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In AAAI.
- [Yi et al. 2016] Yi, X.; Zheng, Y.; Zhang, J.; and Li, T. 2016. Stmvl: filling missing values in geo-sensory time series data.
- [Yoon, Zame, and van der Schaar 2017] Yoon, J.; Zame, W. R.; and van der Schaar, M. 2017. Multi-directional recurrent neural networks: A novel method for estimating missing data.
- [Yu et al. 2017] Yu, L.; Zhang, W.; Wang, J.; and Yu, Y. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI.

Full Text

Tags

Comments