# Concentration Bounds for Co-occurrence Matrices of Markov Chains

NeurIPS 2020, 2020.

Keywords:

Weibo:

Abstract:

Co-occurrence statistics for sequential data are common and important data signals in machine learning, which provide rich correlation and clustering information about the underlying object space. We give the first bound on the convergence rate of estimating the co-occurrence matrix of a regular (aperiodic and irreducible) finite Markov...More

Code:

Data:

Introduction

- Co-occurrence statistics are common and important data signals in machine learning
- They provide rich correlation and clustering information about the underlying object space, such as the word cooccurrence in natural language processing [26,27,28, 22, 29], vertex co-occurrence in graph learning [30, 36, 14, 15, 9, 31], item co-occurrence in recommendation system [35, 24, 3], action co-occurrence in reinforcement learning [38], and emission co-occurrence of hidden Markov models [17].
- Mikolov et al [27] use word sequences

Highlights

- Co-occurrence statistics are common and important data signals in machine learning
- We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, generalizing the undirected regular graph case1 studied by Garg et al [12]
- The main technical contribution of our work is to prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, and we show that the problem of estimating co-occurrence matrices is a non-trivial application of the Chernoff-type bound
- Given a regular Markov chain with n states and mixing time τ, we need a trajectory of length O(τ/ 2) to achieve an estimator of the co-occurrence matrix with error bound
- Our work leads to some natural future questions:. Is it a tight bound? Our analysis on convergence rate of co-occurrence matrices relies on union bound, which probably gives a loose bound
- Can we find more applications of the matrix Chernoff bound for ergodic Markov chains? We believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains

Methods

- The authors show experiments to illustrate the exponentially fast convergence rate of estimating co-occurrence matrices of regular Markov chains.
- For each Markov chain and each trajectory length L from the set {10, 102, · · · , 107}, the authors measure the approximation error of the co-occurrence matrix C constructed by Algorithm 1 from a L-step random walk sampled from the chain.
- Across all the four datasets, the observed exponentially fast convergence rates match what the bounds predict in Theorem 1.
- The authors discuss the observations for each of these datasets

Conclusion

**Conclusion and Future Work**

In this paper, the authors analyze the convergence rate of estimating the co-occurrence matrix of a regular Markov chain.- This problem can be formalized as the convergence of stochastic gradient descent with non-i.i.d but Markovian random samples. Can the authors find more applications of the matrix Chernoff bound for ergodic Markov chains? The authors believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains

Summary

## Introduction:

Co-occurrence statistics are common and important data signals in machine learning- They provide rich correlation and clustering information about the underlying object space, such as the word cooccurrence in natural language processing [26,27,28, 22, 29], vertex co-occurrence in graph learning [30, 36, 14, 15, 9, 31], item co-occurrence in recommendation system [35, 24, 3], action co-occurrence in reinforcement learning [38], and emission co-occurrence of hidden Markov models [17].
- Mikolov et al [27] use word sequences
## Methods:

The authors show experiments to illustrate the exponentially fast convergence rate of estimating co-occurrence matrices of regular Markov chains.- For each Markov chain and each trajectory length L from the set {10, 102, · · · , 107}, the authors measure the approximation error of the co-occurrence matrix C constructed by Algorithm 1 from a L-step random walk sampled from the chain.
- Across all the four datasets, the observed exponentially fast convergence rates match what the bounds predict in Theorem 1.
- The authors discuss the observations for each of these datasets
## Conclusion:

**Conclusion and Future Work**

In this paper, the authors analyze the convergence rate of estimating the co-occurrence matrix of a regular Markov chain.- This problem can be formalized as the convergence of stochastic gradient descent with non-i.i.d but Markovian random samples. Can the authors find more applications of the matrix Chernoff bound for ergodic Markov chains? The authors believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains

Study subjects and analysis

datasets: 4

The relationship between trajectory length L and approximation error is shown in Figure 1 (in log-log scale). Across all the four datasets, the observed exponentially fast convergence rates match what our bounds predict in Theorem 1. Below we discuss our observations for each of these datasets

Reference

- Rudolf Ahlswede and Andreas Winter. Strong converse for identification via quantum channels. IEEE Transactions on Information Theory, 48(3):569–579, 2002.
- Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- Oren Barkan and Noam Koenigstein. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2016.
- Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of data science. Cambridge University Press, 2020.
- Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Efficient sampling for gaussian graphical models via spectral sparsification. In COLT ’15, pages 364–390, 2015.
- Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496, 2015.
- Herman Chernoff et al. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493–507, 1952.
- Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. Chernoff-hoeffding bounds for markov chains: Generalized and simplified. In 29th International Symposium on Theoretical Aspects of Computer Science, page 124, 2012.
- Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD ’17, 2017.
- JJ Dongarra, JR Gabriel, DD Koelling, and JH Wilkinson. The eigenvalue problem for hermitian matrices with time reversal symmetry. Linear Algebra and its Applications, 60:27–42, 1984.
- James Allen Fill. Eigenvalue bounds on convergence to stationarity for nonreversible markov chains, with an application to the exclusion process. The annals of applied probability, pages 62–87, 1991.
- Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1102–1114, 2018.
- David Gillman. A chernoff bound for random walks on expander graphs. SIAM Journal on Computing, 27(4):1203–1220, 1998.
- Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD ’16, pages 855–864. ACM, 2016.
- Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024–1034, 2017.
- Alexander D Healy. Randomness-efficient sampling within nc. Computational Complexity, 17 (1):3–37, 2008.
- Kejun Huang, Xiao Fu, and Nicholas Sidiropoulos. Learning hidden markov models from pairwise co-occurrences with application to topic modeling. In International Conference on Machine Learning, pages 2068–2077, 2018.
- Nabil Kahale. Large deviation bounds for markov chains. Combinatorics, Probability and Computing, 6(4):465–474, 1997.
- Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR ’17, 2017.
- Carlos A León, François Perron, et al. Optimal hoeffding bounds for discrete reversible markov chains. The Annals of Applied Probability, 14(2):958–970, 2004.
- David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- Omer Levy and Yoav Goldberg. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14, pages 2177–2185. 2014.
- Pascal Lezaud. Chernoff-type bound for finite markov chains. Annals of Applied Probability, pages 849–867, 1998.
- Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In Proceedings of the 10th ACM conference on recommender systems, pages 59–66, 2016.
- Milena Mihail. Conductance and convergence of markov chains-a combinatorial treatment of expanders. In FOCS, volume 89, pages 526–531, 1989.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In ICLR Workshop ’13, 2013.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS’ 13, pages 3111–3119. 2013.
- Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 746–751, 2013.
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD ’14, pages 701–710. ACM, 2014.
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18, pages 459–467. ACM, 2018.
- Shravas Rao and Oded Regev. A sharp tail bound for the expander random sampler. arXiv preprint arXiv:1703.10205, 2017.
- Mark Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis, 164 (1):60–72, 1999.
- Thomas Sauerwald and Luca Zanetti. Random walks on dynamic graphs: Mixing times, hitting times, and return probabilities. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
- Guy Shani, David Heckerman, and Ronen I Brafman. An mdp-based recommender system. Journal of Machine Learning Research, 6(Sep):1265–1295, 2005.
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW ’15, pages 1067–1077, 2015.
- Lei Tang and Huan Liu. Relational learning via latent social dimensions. In KDD ’09, pages 817–826. ACM, 2009.
- Guy Tennenholtz and Shie Mannor. The natural language of actions. In International Conference on Machine Learning, pages 6196–6205, 2019.
- Joel A Tropp. An introduction to matrix concentration inequalities. arXiv preprint arXiv:1501.01571, 2015.
- Roy Wagner. Tail estimates for sums of variables sampled by a random walk. Combinatorics, Probability and Computing, 17(2):307–316, 2008.
- Avi Wigderson and David Xiao. A randomness-efficient sampler for matrix-valued functions and applications. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pages 397–406. IEEE, 2005.
- Geoffrey Wolfer and Aryeh Kontorovich. Estimating the mixing time of ergodic markov chains. In Conference on Learning Theory, pages 3120–3159, 2019.
- 1. Also note that fact that λ(Q) ≤ 1, so λ(Q) = 1.
- 1. B Matrix Chernoff Bounds for Ergodic Markov Chains
- 5. We can see verify the following three facts: (1)

Tags

Comments