Concentration Bounds for Co-occurrence Matrices of Markov Chains

NeurIPS 2020, 2020.

Cited by: 0|Bibtex|Views45|Links
Keywords:
random walkergodic markov chainΠP r +regular markov chainco occurrence statisticMore(4+)
Weibo:
The main technical contribution of our work is to prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, and we show that the problem of estimating co-occurrence matrices is a non-trivial application of the Chernoff-type bound

Abstract:

Co-occurrence statistics for sequential data are common and important data signals in machine learning, which provide rich correlation and clustering information about the underlying object space. We give the first bound on the convergence rate of estimating the co-occurrence matrix of a regular (aperiodic and irreducible) finite Markov...More

Code:

Data:

0
Introduction
  • Co-occurrence statistics are common and important data signals in machine learning
  • They provide rich correlation and clustering information about the underlying object space, such as the word cooccurrence in natural language processing [26,27,28, 22, 29], vertex co-occurrence in graph learning [30, 36, 14, 15, 9, 31], item co-occurrence in recommendation system [35, 24, 3], action co-occurrence in reinforcement learning [38], and emission co-occurrence of hidden Markov models [17].
  • Mikolov et al [27] use word sequences
Highlights
  • Co-occurrence statistics are common and important data signals in machine learning
  • We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, generalizing the undirected regular graph case1 studied by Garg et al [12]
  • The main technical contribution of our work is to prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, and we show that the problem of estimating co-occurrence matrices is a non-trivial application of the Chernoff-type bound
  • Given a regular Markov chain with n states and mixing time τ, we need a trajectory of length O(τ/ 2) to achieve an estimator of the co-occurrence matrix with error bound
  • Our work leads to some natural future questions:. Is it a tight bound? Our analysis on convergence rate of co-occurrence matrices relies on union bound, which probably gives a loose bound
  • Can we find more applications of the matrix Chernoff bound for ergodic Markov chains? We believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains
Methods
  • The authors show experiments to illustrate the exponentially fast convergence rate of estimating co-occurrence matrices of regular Markov chains.
  • For each Markov chain and each trajectory length L from the set {10, 102, · · · , 107}, the authors measure the approximation error of the co-occurrence matrix C constructed by Algorithm 1 from a L-step random walk sampled from the chain.
  • Across all the four datasets, the observed exponentially fast convergence rates match what the bounds predict in Theorem 1.
  • The authors discuss the observations for each of these datasets
Conclusion
  • Conclusion and Future Work

    In this paper, the authors analyze the convergence rate of estimating the co-occurrence matrix of a regular Markov chain.
  • This problem can be formalized as the convergence of stochastic gradient descent with non-i.i.d but Markovian random samples. Can the authors find more applications of the matrix Chernoff bound for ergodic Markov chains? The authors believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains
Summary
  • Introduction:

    Co-occurrence statistics are common and important data signals in machine learning
  • They provide rich correlation and clustering information about the underlying object space, such as the word cooccurrence in natural language processing [26,27,28, 22, 29], vertex co-occurrence in graph learning [30, 36, 14, 15, 9, 31], item co-occurrence in recommendation system [35, 24, 3], action co-occurrence in reinforcement learning [38], and emission co-occurrence of hidden Markov models [17].
  • Mikolov et al [27] use word sequences
  • Methods:

    The authors show experiments to illustrate the exponentially fast convergence rate of estimating co-occurrence matrices of regular Markov chains.
  • For each Markov chain and each trajectory length L from the set {10, 102, · · · , 107}, the authors measure the approximation error of the co-occurrence matrix C constructed by Algorithm 1 from a L-step random walk sampled from the chain.
  • Across all the four datasets, the observed exponentially fast convergence rates match what the bounds predict in Theorem 1.
  • The authors discuss the observations for each of these datasets
  • Conclusion:

    Conclusion and Future Work

    In this paper, the authors analyze the convergence rate of estimating the co-occurrence matrix of a regular Markov chain.
  • This problem can be formalized as the convergence of stochastic gradient descent with non-i.i.d but Markovian random samples. Can the authors find more applications of the matrix Chernoff bound for ergodic Markov chains? The authors believe Theorem 2 could have further applications, e.g., in reinforcement learning which involves Markov chains
Study subjects and analysis
datasets: 4
The relationship between trajectory length L and approximation error is shown in Figure 1 (in log-log scale). Across all the four datasets, the observed exponentially fast convergence rates match what our bounds predict in Theorem 1. Below we discuss our observations for each of these datasets

Reference
  • Rudolf Ahlswede and Andreas Winter. Strong converse for identification via quantum channels. IEEE Transactions on Information Theory, 48(3):569–579, 2002.
    Google ScholarLocate open access versionFindings
  • Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
    Google ScholarLocate open access versionFindings
  • Oren Barkan and Noam Koenigstein. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of data science. Cambridge University Press, 2020.
    Google ScholarFindings
  • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Efficient sampling for gaussian graphical models via spectral sparsification. In COLT ’15, pages 364–390, 2015.
    Google ScholarLocate open access versionFindings
  • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496, 2015.
    Findings
  • Herman Chernoff et al. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493–507, 1952.
    Google ScholarLocate open access versionFindings
  • Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. Chernoff-hoeffding bounds for markov chains: Generalized and simplified. In 29th International Symposium on Theoretical Aspects of Computer Science, page 124, 2012.
    Google ScholarLocate open access versionFindings
  • Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD ’17, 2017.
    Google ScholarLocate open access versionFindings
  • JJ Dongarra, JR Gabriel, DD Koelling, and JH Wilkinson. The eigenvalue problem for hermitian matrices with time reversal symmetry. Linear Algebra and its Applications, 60:27–42, 1984.
    Google ScholarLocate open access versionFindings
  • James Allen Fill. Eigenvalue bounds on convergence to stationarity for nonreversible markov chains, with an application to the exclusion process. The annals of applied probability, pages 62–87, 1991.
    Google ScholarFindings
  • Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1102–1114, 2018.
    Google ScholarLocate open access versionFindings
  • David Gillman. A chernoff bound for random walks on expander graphs. SIAM Journal on Computing, 27(4):1203–1220, 1998.
    Google ScholarLocate open access versionFindings
  • Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD ’16, pages 855–864. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024–1034, 2017.
    Google ScholarLocate open access versionFindings
  • Alexander D Healy. Randomness-efficient sampling within nc. Computational Complexity, 17 (1):3–37, 2008.
    Google ScholarFindings
  • Kejun Huang, Xiao Fu, and Nicholas Sidiropoulos. Learning hidden markov models from pairwise co-occurrences with application to topic modeling. In International Conference on Machine Learning, pages 2068–2077, 2018.
    Google ScholarLocate open access versionFindings
  • Nabil Kahale. Large deviation bounds for markov chains. Combinatorics, Probability and Computing, 6(4):465–474, 1997.
    Google ScholarLocate open access versionFindings
  • Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR ’17, 2017.
    Google ScholarLocate open access versionFindings
  • Carlos A León, François Perron, et al. Optimal hoeffding bounds for discrete reversible markov chains. The Annals of Applied Probability, 14(2):958–970, 2004.
    Google ScholarLocate open access versionFindings
  • David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
    Google ScholarFindings
  • Omer Levy and Yoav Goldberg. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14, pages 2177–2185. 2014.
    Google ScholarLocate open access versionFindings
  • Pascal Lezaud. Chernoff-type bound for finite markov chains. Annals of Applied Probability, pages 849–867, 1998.
    Google ScholarLocate open access versionFindings
  • Dawen Liang, Jaan Altosaar, Laurent Charlin, and David M Blei. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In Proceedings of the 10th ACM conference on recommender systems, pages 59–66, 2016.
    Google ScholarLocate open access versionFindings
  • Milena Mihail. Conductance and convergence of markov chains-a combinatorial treatment of expanders. In FOCS, volume 89, pages 526–531, 1989.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In ICLR Workshop ’13, 2013.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS’ 13, pages 3111–3119. 2013.
    Google ScholarFindings
  • Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 746–751, 2013.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
    Google ScholarLocate open access versionFindings
  • Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD ’14, pages 701–710. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM ’18, pages 459–467. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • Shravas Rao and Oded Regev. A sharp tail bound for the expander random sampler. arXiv preprint arXiv:1703.10205, 2017.
    Findings
  • Mark Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis, 164 (1):60–72, 1999.
    Google ScholarLocate open access versionFindings
  • Thomas Sauerwald and Luca Zanetti. Random walks on dynamic graphs: Mixing times, hitting times, and return probabilities. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
    Google ScholarLocate open access versionFindings
  • Guy Shani, David Heckerman, and Ronen I Brafman. An mdp-based recommender system. Journal of Machine Learning Research, 6(Sep):1265–1295, 2005.
    Google ScholarLocate open access versionFindings
  • Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW ’15, pages 1067–1077, 2015.
    Google ScholarLocate open access versionFindings
  • Lei Tang and Huan Liu. Relational learning via latent social dimensions. In KDD ’09, pages 817–826. ACM, 2009.
    Google ScholarLocate open access versionFindings
  • Guy Tennenholtz and Shie Mannor. The natural language of actions. In International Conference on Machine Learning, pages 6196–6205, 2019.
    Google ScholarLocate open access versionFindings
  • Joel A Tropp. An introduction to matrix concentration inequalities. arXiv preprint arXiv:1501.01571, 2015.
    Findings
  • Roy Wagner. Tail estimates for sums of variables sampled by a random walk. Combinatorics, Probability and Computing, 17(2):307–316, 2008.
    Google ScholarLocate open access versionFindings
  • Avi Wigderson and David Xiao. A randomness-efficient sampler for matrix-valued functions and applications. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pages 397–406. IEEE, 2005.
    Google ScholarLocate open access versionFindings
  • Geoffrey Wolfer and Aryeh Kontorovich. Estimating the mixing time of ergodic markov chains. In Conference on Learning Theory, pages 3120–3159, 2019.
    Google ScholarLocate open access versionFindings
  • 1. Also note that fact that λ(Q) ≤ 1, so λ(Q) = 1.
    Google ScholarFindings
  • 1. B Matrix Chernoff Bounds for Ergodic Markov Chains
    Google ScholarFindings
  • 5. We can see verify the following three facts: (1)
    Google ScholarFindings
Your rating :
0

 

Tags
Comments