Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values

national conference on artificial intelligence, 2020.

Cited by: 5|Bibtex|Views135
Other Links: academic.microsoft.com|arxiv.org
Weibo:
We investigate a novel problem of exploring local and global temporal dynamics for Multivariate time series forecasting with missing values

Abstract:

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural...More

Code:

Data:

0
Introduction
Highlights
  • Multivariate time series (MTS) forecasting is widely used in many applications such as weather forecasting (Xingjian et al 2015), clinical diagnosis (Che et al 2018), sales forecasting (Wu et al 2018; Wu et al 2019) and traffic analysis (Yao et al 2019b; Yao et al 2018; Yao et al 2019a; Tang et al 2019)
  • Recurrent neural networks (RNNs), a class of deep learning frameworks designed for modeling sequential data, have been successfully applied to this problem
  • We report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values
  • We investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values
  • We propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys
  • To make the generated MTS more realistic, we further adopt adversarial training to enhance the modeling of global temporal data distribution
Results
  • The authors propose a novel framework LGnet, with a memory module to capture global temporal dynamics for missing values and adversarial training to enhances the modeling of global temporal distribution.
  • The authors first extract local statistic features for every time interval, use them as keys to query a memory component, which is jointly optimized with LSTM on all MTS data.
  • For each variable in a MTS, the authors first capture informative statistics from the local context of this time series, leverage local statistics as keys to query the memory component, which returns representation vectors with global temporal dynamics.
  • J=1 where θ are parameters of LGnet, including parameters of the LSTM and the memory component, Mjp is the mask matrix of the j-th MTS data sample Xj over the predicted variables, and is dot-production.
  • Linear Regression (LR): Because conventional linear regression model cannot directly handle missing values, the authors concatenate each MTS with its mask matrix as the input features to train LR for the forecasting task.
  • The authors report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values.
  • The memory module explores global temporal dynamics and generates appropriate estimations for missing values; (ii) when k increases, i.e., when forecasting far future values, the performance of all the methods decreases, which is reasonable because it’s more difficult to forecast far future values than near ones.
  • LGnet significantly out-performs LGnetadv, indicating that modeling global temporal dynamics with the memory module benefits the forecasting.
  • This is because the original MTS forecasting objective is less efficient with a high missing ratio, as it only relies on observed parts of the time series.
  • The authors investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values.
Conclusion
  • The authors propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys.
  • To make the generated MTS more realistic, the authors further adopt adversarial training to enhance the modeling of global temporal data distribution.
  • Experimental results on four large-scale real-world datasets show the efficacy of LGnet
Summary
  • Multivariate time series (MTS) forecasting is widely used in many applications such as weather forecasting (Xingjian et al 2015), clinical diagnosis (Che et al 2018), sales forecasting (Wu et al 2018; Wu et al 2019) and traffic analysis (Yao et al 2019b; Yao et al 2018; Yao et al 2019a; Tang et al 2019).
  • Modeling local and global temporal dynamics is very promising for MTS forecasting with missing values.
  • The authors study a new problem of MTS forecasting with missing values by exploring local and global temporal dynamics.
  • The authors propose a novel framework LGnet, with a memory module to capture global temporal dynamics for missing values and adversarial training to enhances the modeling of global temporal distribution.
  • The authors first extract local statistic features for every time interval, use them as keys to query a memory component, which is jointly optimized with LSTM on all MTS data.
  • For each variable in a MTS, the authors first capture informative statistics from the local context of this time series, leverage local statistics as keys to query the memory component, which returns representation vectors with global temporal dynamics.
  • J=1 where θ are parameters of LGnet, including parameters of the LSTM and the memory component, Mjp is the mask matrix of the j-th MTS data sample Xj over the predicted variables, and is dot-production.
  • Linear Regression (LR): Because conventional linear regression model cannot directly handle missing values, the authors concatenate each MTS with its mask matrix as the input features to train LR for the forecasting task.
  • The authors report the performance on the two datasets for k = 1, 2, 3 in Table 1, and make the following observations: (i) LGnet outperforms all the baseline methods for the majority of the cases, which shows the effectiveness of the memory module and adversarial learning for multivariate time series forecasting with missing values.
  • The memory module explores global temporal dynamics and generates appropriate estimations for missing values; (ii) when k increases, i.e., when forecasting far future values, the performance of all the methods decreases, which is reasonable because it’s more difficult to forecast far future values than near ones.
  • LGnet significantly out-performs LGnetadv, indicating that modeling global temporal dynamics with the memory module benefits the forecasting.
  • This is because the original MTS forecasting objective is less efficient with a high missing ratio, as it only relies on observed parts of the time series.
  • The authors investigate a novel problem of exploring local and global temporal dynamics for MTS forecasting with missing values.
  • The authors propose a new framework LGnet, which adopts memory network to capture global temporal patterns using local statistics as keys.
  • To make the generated MTS more realistic, the authors further adopt adversarial training to enhance the modeling of global temporal data distribution.
  • Experimental results on four large-scale real-world datasets show the efficacy of LGnet
Tables
  • Table1: MTS forecasting performances on Beijing Air and PhysioNet
  • Table2: MTS forecasting performance of variants
  • Table3: Analysis of hyper-parameter λ
Download tables as Excel
Related work
  • Various methods have been proposed for MTS forecasting, such as Autoregressive (AR), Vector Autoregression (VAR), Autoregressive moving average (ARMA), standard regression models (e.g., support vector regression (Smola and Scholkopf 2004), linear regression, and regression tree methods (Chen and Guestrin 2016)). Inspired by the recent success of deep neural networks, many RNN-based methods (Lai et al 2018; Qin et al 2017) are developed for MTS forecasting. Even some vanilla RNNs, such as GRU (Chung et al 2014) and LSTM (Hochreiter and Schmidhuber 1997), can outperform the non deep learning models significantly (Chang et al 2018). However, none of those approaches can handle input with missing values.

    To handle missing values in MTS, the simplest solution would be removing all samples with missing values, such as pairwise deletion (Marsh 1998). Obviously, such methods ignore many useful information, especially with a high missing ratio (King et al 1998). General data imputation methods such as statistical imputation (e.g., mean, median), EMbased imputation (Nelwamondo, Mohamed, and Marwala 2007), K-nearest neighborhood (Friedman, Hastie, and Tibshirani 2001), and matrix factorization (Friedman, Hastie, and Tibshirani 2001) can be applied for the unobserved variables. However, those general approaches fail to model temporal dynamics of time series. Even if MTS imputation methods, such as multivariate imputation by chained equations (Azur et al 2011) and generative adversarial network Luo et al, can be applied to fill in missing values first, training a forecasting model on pre-processed MTS data would lead to sub-optimal results, since the temporal patterns of missing values are totally isolated from forecasting models (Wells et al 2013). To tackle this issue, some researchers propose end-to-end frameworks that jointly estimate missing values and forecast future MTS. Che et al introduce GRUD that imputes missing values using the linear combination of statistical features. Yoon, Zame, and van der Schaar propose M-RNN that leverages bi-directional RNN for the imputation. Cao et al model the relationships between missing variables to simultaneously perform imputation and classification/regression in one neural graph. However, those solutions focus on localized temporal dependencies and fail to model global temporal dynamics.
Funding
  • This material is based upon work supported by, or in part by, the National Science Foundation (NSF) under grant #1909702
Reference
  • [Arjovsky, Chintala, and Bottou 2017] Arjovsky, M.; Chintala, S.; and Bottou, L. 2017. Wasserstein gan. arXiv:1701.07875.
    Findings
  • [Azur et al. 2011] Azur, M. J.; Stuart, E. A.; Frangakis, C.; and Leaf, P. J. 2011. Multiple imputation by chained equations: what is it and how does it work? International journal of methods in psychiatric research 20(1):40–49.
    Google ScholarLocate open access versionFindings
  • [Bengio et al. 2015] Bengio, S.; Vinyals, O.; Jaitly, N.; and Shazeer, N. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NeurIPS, 1171–1179.
    Google ScholarFindings
  • [Box et al. 2015] Box, G. E.; Jenkins, G. M.; Reinsel, G. C.; and Ljung, G. M. 2015. Time series analysis: forecasting and control. John Wiley & Sons.
    Google ScholarFindings
  • [Cao et al. 2018] Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; and Li, Y. 2018. Brits: Bidirectional recurrent imputation for time series. arXiv:1805.10572.
    Findings
  • [Chang et al. 2018] Chang, Y.-Y.; Sun, F.-Y.; Wu, Y.-H.; and Lin, S.-D. 2018. A memory-network based solution for multivariate time-series forecasting. arXiv:1809.02105.
    Findings
  • [Che et al. 2018] Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; and Liu, Y. 2018. Recurrent neural networks for multivariate time series with missing values. Scientific reports 8(1):6085.
    Google ScholarLocate open access versionFindings
  • [Chen and Guestrin 2016] Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In KDD, 785–794. ACM.
    Google ScholarLocate open access versionFindings
  • [Cho et al. 2014] Cho, K.; Van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078.
    Findings
  • [Chung et al. 2014] Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555.
    Findings
  • [Friedman, Hastie, and Tibshirani 2001] Friedman, J.; Hastie, T.; and Tibshirani, R. 2001. The elements of statistical learning, volume 1. Springer series in statistics New York, NY, USA:.
    Google ScholarFindings
  • [Garcıa-Laencina, Sancho-Gomez, and Figueiras-Vidal 2010] Garcıa-Laencina, P. J.; Sancho-Gomez, J.-L.; and FigueirasVidal, A. R. 2010. Pattern classification with missing data: a review. Neural Computing and Applications 19(2):263–282.
    Google ScholarLocate open access versionFindings
  • [Goodfellow et al. 2014] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In NeurIPS, 2672–2680.
    Google ScholarFindings
  • [Hochreiter and Schmidhuber 1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • [King et al. 1998] King, G.; Honaker, J.; Joseph, A.; and Scheve, K. 1998. List-wise deletion is evil: what to do about missing data in political science. In APSA.
    Google ScholarFindings
  • [Kumar et al. 2016] Kumar, A.; Irsoy, O.; Ondruska, P.; Iyyer, M.; Bradbury, J.; Gulrajani, I.; Zhong, V.; Paulus, R.; and Socher, R. 20Ask me anything: Dynamic memory networks for natural language processing. In ICML, 1378–1387.
    Google ScholarLocate open access versionFindings
  • [Lai et al. 2018] Lai, G.; Chang, W.-C.; Yang, Y.; and Liu, H. 2018. Modeling long-and short-term temporal patterns with deep neural networks. In SIGIR. ACM.
    Google ScholarLocate open access versionFindings
  • [Luo et al. 2018] Luo, Y.; Cai, X.; Zhang, Y.; Xu, J.; et al. 20Multivariate time series imputation with generative adversarial networks. In NeurIPS, 1603–1614.
    Google ScholarFindings
  • [Marsh 1998] Marsh, H. W. 1998. Pairwise deletion for missing data in structural equation models: Nonpositive definite matrices, parameter estimates, goodness of fit, and adjusted sample sizes. Structural Equation Modeling: A Multidisciplinary Journal 5(1).
    Google ScholarLocate open access versionFindings
  • [Nelwamondo, Mohamed, and Marwala 2007] Nelwamondo, F. V.; Mohamed, S.; and Marwala, T. 2007. Missing data: A comparison of neural network and expectation maximization techniques. Current Science.
    Google ScholarFindings
  • [Qin et al. 2017] Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; and Cottrell, G. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv:1704.02971.
    Findings
  • [Shu et al. 2018] Shu, K.; Wang, S.; Le, T.; Lee, D.; and Liu, H. 2018. Deep headline generation for clickbait detection. In ICDM. IEEE.
    Google ScholarFindings
  • [Silva et al. 2012] Silva, I.; Moody, G.; Scott, D. J.; Celi, L. A.; and Mark, R. G. 2012. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. Computing in cardiology 39:245.
    Google ScholarLocate open access versionFindings
  • [Smola and Scholkopf 2004] Smola, A. J., and Scholkopf, B. 2004. A tutorial on support vector regression. Statistics and computing 14(3):199–222.
    Google ScholarLocate open access versionFindings
  • [Sukhbaatar et al. 2015] Sukhbaatar, S.; Weston, J.; Fergus, R.; et al. 2015. End-to-end memory networks. In NeurIPS, 2440– 2448.
    Google ScholarLocate open access versionFindings
  • [Sun et al. 2019] Sun, Y.; Wang, S.; Hsieh, T.-Y.; Tang, X.; and Honavar, V. 2019. Megan: a generative adversarial network for multi-view network embedding. arXiv preprint arXiv:1909.01084.
    Findings
  • [Tang et al. 2017] Tang, J.; Wang, Y.; Zheng, K.; and Mei, Q. 2017. End-to-end learning for short text expansion. In KDD, 1105–1113. ACM.
    Google ScholarFindings
  • [Tang et al. 2019] Tang, X.; Gong, B.; Yu, Y.; Yao, H.; Li, Y.; Xie, H.; and Wang, X. 2019. Joint modeling of dense and incomplete trajectories for citywide traffic volume inference. In WWW. ACM.
    Google ScholarLocate open access versionFindings
  • [Wells et al. 2013] Wells, B. J.; Chagin, K. M.; Nowacki, A. S.; and Kattan, M. W. 2013. Strategies for handling missing data in electronic health record derived data. Egems 1(3).
    Google ScholarLocate open access versionFindings
  • [Weston, Chopra, and Bordes 2014] Weston, J.; Chopra, S.; and Bordes, A. 2014. Memory networks. arXiv:1410.3916.
    Findings
  • [Wu et al. 2018] Wu, X.; Shi, B.; Dong, Y.; Huang, C.; Faust, L.; and Chawla, N. V. 2018. Restful: Resolution-aware forecasting of behavioral time series data. In CIKM, 1073–1082. ACM.
    Google ScholarFindings
  • [Wu et al. 2019] Wu, X.; Shi, B.; Dong, Y.; Huang, C.; and Chawla, N. V. 2019. Neural tensor factorization for temporal interaction learning. In WSDM, 537–545. ACM.
    Google ScholarLocate open access versionFindings
  • [Xingjian et al. 2015] Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; and Woo, W.-c. 2015. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In NeurIPS, 802–810.
    Google ScholarLocate open access versionFindings
  • [Yao et al. 2018] Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; and Li, Z. 2018. Deep multi-view spatialtemporal network for taxi demand prediction. In AAAI.
    Google ScholarFindings
  • [Yao et al. 2019a] Yao, H.; Liu, Y.; Wei, Y.; Tang, X.; and Li, Z. 2019a. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction. WWW.
    Google ScholarLocate open access versionFindings
  • [Yao et al. 2019b] Yao, H.; Tang, X.; Wei, H.; Zheng, G.; and Li, Z. 2019b. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In AAAI.
    Google ScholarFindings
  • [Yi et al. 2016] Yi, X.; Zheng, Y.; Zhang, J.; and Li, T. 2016. Stmvl: filling missing values in geo-sensory time series data.
    Google ScholarFindings
  • [Yoon, Zame, and van der Schaar 2017] Yoon, J.; Zame, W. R.; and van der Schaar, M. 2017. Multi-directional recurrent neural networks: A novel method for estimating missing data.
    Google ScholarFindings
  • [Yu et al. 2017] Yu, L.; Zhang, W.; Wang, J.; and Yu, Y. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments