## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Normalizing Kalman Filters for Multivariate Time Series Analysis

NIPS 2020, (2020)

下载 PDF 全文

微博一下

关键词

摘要

This paper tackles the modelling of large, complex & multivariate time series panels in a probabilistic setting. To this extent, we present a novel approach reconciling classical state space models with deep learning methods. By augmenting state space models with normalizing flows, we mitigate imprecisions stemming from idealized assumpti...更多

代码：

数据：

简介

- In most real world applications of time series analysis, e.g., risk management in finance, cannibalization of products in retail or anomaly detection in cloud computing environments, time series are not mutually independent and an accurate modelling approach must take these dependencies into account [1].
- The classical approach [2] is to extend standard univariate models resulting in vector autoregression [3], multivariate GARCH [4] and multivariate state space models [5, 6]
- These approaches yield useful theoretical properties, they make idealized assumptions like Gaussianity, linear inter-dependencies, and are not scalable to even moderate number of time series [7] due to the number of parameters required to be estimated, which is restrictive for many modern applications involving large panels of time series.

重点内容

- In most real world applications of time series analysis, e.g., risk management in finance, cannibalization of products in retail or anomaly detection in cloud computing environments, time series are not mutually independent and an accurate modelling approach must take these dependencies into account [1]
- In this paper we propose the Normalizing Kalman Filter (NKF), a novel approach for modelling and forecasting complex multivariate time series by augmenting classical linear Gaussian state space models (LGM) with normalizing flows [10]
- Similar to [9, 8], we propose to predict the parameters Θ from the covariates, using a recurrent neural network (RNN) where the recurrent function Ψ is parametrized by Φ, taking into account the possibly nonlinear relationship between covariates xt: Θt = σ(ht; Φ), ht = Ψ(xt, ht−1; Φ), t = 1
- Unlike our NKF model, inference and likelihood computation are not tractable in KF with Variational Auto-Encoders (KVAE) and it relies on particle filters for their approximation
- In this paper we presented a simple, tractable and scalable approach to high-dimensional multivariate time series analysis, combining classical state space models with normalizing flows
- One caveat of our approach is that we no longer have identifiability w.r.t. the state space parameters: an interesting avenue of research is to work towards identifiability, e.g. by constraining the normalizing flow’s expressivity

方法

- Filters [16] resort to Monte Carlo approximations of these integrals, but have difficulty scaling to high dimensions.
- Other methods circumvent this by locally linearizing the nonlinear transformation [17], or using a finite sample approximation [18] in order to apply–in both cases– the techniques of the standard LGM, but introduce a bias.

结果

- The authors follow the experimental set up proposed in [1] since it focusses on the same problem in the forecasting application.
- The authors' evaluation is extensive covering relevant classical multivariate approaches as well as recent deep learning based models.
- The authors compare against recent deep-learning based approaches GP-Copula [1] and KVAE [12] developed for handling non-Gaussian multivariate data with non-linear dependencies.
- The authors compare with DeepAR [8] an autoregressive recurrent neural network based method for univariate time series forecasting.
- See Table 1 for summary of the compared methods based on various parameters

结论

**Discussion and Conclusion**

In this paper the authors presented a simple, tractable and scalable approach to high-dimensional multivariate time series analysis, combining classical state space models with normalizing flows.- The authors' approach can capture non-linear dependencies in the data and non-Gaussian noise, while still inheriting important analytic properties of the linear Gaussian state space model.
- This model is flexible, while still retaining interesting prior structural information, paramount to good generalization in low data regimes.

总结

## Introduction:

In most real world applications of time series analysis, e.g., risk management in finance, cannibalization of products in retail or anomaly detection in cloud computing environments, time series are not mutually independent and an accurate modelling approach must take these dependencies into account [1].- The classical approach [2] is to extend standard univariate models resulting in vector autoregression [3], multivariate GARCH [4] and multivariate state space models [5, 6]
- These approaches yield useful theoretical properties, they make idealized assumptions like Gaussianity, linear inter-dependencies, and are not scalable to even moderate number of time series [7] due to the number of parameters required to be estimated, which is restrictive for many modern applications involving large panels of time series.
## Methods:

Filters [16] resort to Monte Carlo approximations of these integrals, but have difficulty scaling to high dimensions.- Other methods circumvent this by locally linearizing the nonlinear transformation [17], or using a finite sample approximation [18] in order to apply–in both cases– the techniques of the standard LGM, but introduce a bias.
## Results:

The authors follow the experimental set up proposed in [1] since it focusses on the same problem in the forecasting application.- The authors' evaluation is extensive covering relevant classical multivariate approaches as well as recent deep learning based models.
- The authors compare against recent deep-learning based approaches GP-Copula [1] and KVAE [12] developed for handling non-Gaussian multivariate data with non-linear dependencies.
- The authors compare with DeepAR [8] an autoregressive recurrent neural network based method for univariate time series forecasting.
- See Table 1 for summary of the compared methods based on various parameters
## Conclusion:

**Discussion and Conclusion**

In this paper the authors presented a simple, tractable and scalable approach to high-dimensional multivariate time series analysis, combining classical state space models with normalizing flows.- The authors' approach can capture non-linear dependencies in the data and non-Gaussian noise, while still inheriting important analytic properties of the linear Gaussian state space model.
- This model is flexible, while still retaining interesting prior structural information, paramount to good generalization in low data regimes.

- Table1: Comparative summary of competing approaches on various parameters
- Table2: CRPS-Sum-N (lower is better), averaged over 3 runs. The case ft = id is DeepState [<a class="ref-link" id="c9" href="#r9">9</a>] and VES can be seen as part of ablation where normalizing flow and RNN are removed from NKF

相关工作

- Neural networks for forecasting have seen growing attention in recent years [8, 27, 28, 29, 30, 31, 32]. We refer to [33] for an introductory overview. Most work concerns the univariate case using global models, assuming that time series are independent given the covariates and the model parameters. The family of global/local models, e.g., [34, 28], provide a more explicit way of incorporating global effects into univariate time series, without attempting to estimate the covariance structure of the data. An explicit probabilistic multivariate forecasting model is proposed in [1], which relies on Gaussian copulas to model non-Gaussian multivariate data. Further related work combines probabilistic time series models with neural networks (e.g., point/renewal processes and neural networks [35] or exponential smoothing based expressions [27]). We extend the approach in [9] which uses an RNN to parametrize a state space model to the multivariate case alleviating Gaussianity and linear dependency assumptions in the observation model. The idea to take advantage of the appealing properties of Kalman Filters (KF) [36] while relaxing its assumptions is not new. Prominent examples include the Extended Kalman Filter (EKF) [17], the Unscented Kalman Filter (UKF) [18] and Particle Filters (PF) [37] that relax the linearity and Gaussianity assumption by approximation or sampling techniques. The Gaussian process state space model (GPSSM) [38, 39] is a nonlinear dynamical system that extends LGMs by using GPs as the transition and/or observation mappings but typically assume the noise is additive Gaussian. If the noise is non-Gaussian, then these models again have to resort to approximation techniques similar to Particle Filters. Kernel Kalman Filters [40] address the linearity limitation of LGMs by defining the state space model in the reproducing Kernel Hilbert space (RKHS). In particular, the random latent state and the observation variable are mapped to RKHS and the state dynamics and the observation model are assumed to be linear in the kernel space. Note, however, that this approach still relies on the assumption that the noise is additive Gaussian. Similarly, combining KF with neural networks is not new. Additionally to [9], [11] proposes to combine KF with Variational Auto-Encoders (KVAE) and [12] proposes variational approximations of the predictive distribution in nonlinear state space models. Finally, while most work on normalizing flows [13, 14, 41, 15] was presented in the i.i.d. setting, extensions to sequential data have recently been proposed [42, 43, 44].

基金

- The strong results obtained for up to 90% of missing data demonstrate that our method encodes useful prior knowledge due to the structure induced in the LGM, rendering this method useful even in low data regimes (the same observation is made in [9] for this dataset)

研究对象与分析

samples: 15

Observations and forecasts can also be seen in Appendix C.1 from a viewpoint that better highlights the seasonal nature samples target. 10 15 samples target. 5 10 15 (a) S1 : Results with (left) and without (right) NF

target samples: 75100125150

5 10 15 (a) S1 : Results with (left) and without (right) NF. 25 50 75 100 125 150 target samples (b) S2 : Results with (left) and without (right) NF. VES VAR GARCH DeepAR GP-Copula KVAE NKF(Ours)

datasets: 5

Deep learning based models have superior performance overall. In particular, NKF achieves the best result in 4 out of 5 datasets. On traffic method exchange solar elec wiki traffic

引用论文

- David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes. In Advances in Neural Information Processing Systems 32, pages 6824–6834. 2019.
- Helmut Lütkepohl. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.
- Ashton de Silva, Rob J Hyndman, and Ralph Snyder. The vector innovations structural time series framework: a simple approach to multivariate forecasting. Statistical Modelling, 10(4):353–374, 2010.
- Luc Bauwens, Sébastien Laurent, and Jeroen VK Rombouts. Multivariate garch models: a survey. Journal of applied econometrics, 21(1):79–109, 2006.
- Rob Hyndman, Anne Koehler, Keith Ord, and Ralph Snyder. Forecasting with exponential smoothing. The state space approach. 2008.
- James Durbin and Siem Jan Koopman. Time series analysis by state space methods, volume 38. Oxford University Press, 2012.
- Andrew J Patton. A review of copula models for economic time series. Journal of Multivariate Analysis, 110:4–18, 2012.
- David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 2019.
- Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting. In Advances in Neural Information Processing Systems, pages 7785–7794, 2018.
- Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pages 1530–1538, 2015.
- Marco Fraccaro, Simon Kamronn, Ulrich Paquet, and Ole Winther. A disentangled recognition and nonlinear dynamics model for unsupervised learning. In Advances in Neural Information Processing Systems 30, pages 3601–3610. Curran Associates, Inc., 2017.
- Rahul G Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In 31st AAAI Conference on Artificial Intelligence, 2017.
- Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
- Diederik P. Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, pages 10236–10245, 2018.
- Junier Oliva, Avinava Dubey, Manzil Zaheer, Barnabas Poczos, Ruslan Salakhutdinov, Eric Xing, and Jeff Schneider. Transformation autoregressive networks. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3898–3907. PMLR, 10–15 Jul 2018.
- Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(Oct):2295– 2328, 2009.
- Paul Zarchan and Howard Musoff. Fundamentals of Kalman Filtering: A Practical Approach, Fourth Edition. American Institute of Aeronautics and Astronautics, Inc., Reston, VA, 2015.
- Simon J Julier and Jeffrey K Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3):401–422, 2004.
- D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2011.
- R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982.
- Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. BRITS: bidirectional recurrent imputation for time series. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 6776–6786, 2018.
- Roy Van der Weide. Go-garch: a multivariate generalized orthogonal garch model. Journal of Applied Econometrics, 17(5):549–564, 2002.
- James E Matheson and Robert L Winkler. Scoring rules for continuous probability distributions. Management science, 22(10):1087–1096, 1976.
- Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007.
- Pierre Pinson and Julija Tastu. Discrimination ability of the Energy score. Number 15 in DTU Compute-Technical Report-2013. Technical University of Denmark, 2013.
- Aaditya Ramdas, Sashank Jakkam Reddi, Barnabás Póczos, Aarti Singh, and Larry A. Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 3571–3577. AAAI Press, 2015.
- Slawek Smyl. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting, 36(1):75–85, 2020.
- Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deep factors for forecasting. In International Conference on Machine Learning, pages 6607– 6617, 2019.
- Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama S. Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic forecasting with spline quantile function RNNs. AISTATS, 2019.
- Nikolay Laptev, Jason Yosinsk, Li Li Erran, and Slawek Smyl. Time-series Extreme Event Forecasting with Neural Networks at Uber. In ICML Time Series Workshop. 2017.
- Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, page 95–104. Association for Computing Machinery, 2018.
- Boris N Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437, 2019.
- Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Bernie Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, et al. Neural forecasting: Introduction and literature overview. arXiv preprint arXiv:2004.10240, 2020.
- Rajat Sen, Hsiang-Fu Yu, and Inderjit S Dhillon. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. In Advances in Neural Information Processing Systems 32, pages 4838–4847. Curran Associates, Inc., 2019.
- Ali Caner Turkmen, Yuyang Wang, and Tim Januschowski. Intermittent demand forecasting with deep renewal processes. 2019.
- Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35–45, 1960.
- Jun S Liu and Rong Chen. Sequential monte carlo methods for dynamic systems. Journal of the American statistical association, 93(443):1032–1044, 1998.
- J. Ko and D. Fox. Learning GP-Bayes filters via Gaussian process latent variable models. Autonomous Robots, 30:3–23, 2011.
- M. P. Deisenroth, R. D. Turner, M. F. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust filtering and smoothing with gaussian processes. IEEE Transactions on Automatic Control, 57(7):1865–1871, 2012.
- Gregor H. W. Gebhardt, Andras Kupcsik, and Gerhard Neumann. The kernel Kalman rule — Efficient nonparametric inference with recursive least squares. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 3754–3760, 2017.
- Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, and Jörn-Henrik Jacobsen. Invertible residual networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, pages 573–582, 2019.
- Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. Unsupervised learning of syntactic structure with invertible neural projections. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
- Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard Hovy. Flowseq: Nonautoregressive conditional sequence generation with generative flow, 2019.
- Zachary Ziegler and Alexander Rush. Latent normalizing flows for discrete sequences. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7673–7682, Long Beach, California, USA, 09–15 Jun 2019. PMLR.
- David Simchi-Levi and Edith Simchi-Levi. We need a stress test for critical supply chains. Harvard Business Review, 2020.
- Michael Bohlke-Schneider, Shubham Kapoor, and Tim Januschowski. Resilient neural forecasting systems. In Proceedings of DEEM. ACM, 2020.
- Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapuram, David Salinas, and Jasper Schulz. GluonTS: Probabilistic Time Series Models in Python. Journal of Machine Learning Research, to appear.
- Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks. CoRR, abs/1703.07015, 2017.
- Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2017.
- Gábor J Székely. E-statistics: The energy of statistical samples. Bowling Green State University, Department of Mathematics and Statistics Technical Report, 3(05):1–18, 2003.

标签

评论