## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev

NIPS 2020, (2020)

EI

Keywords

Abstract

Sampling is a fundamental and arguably very important task with numerous applications in Machine Learning. One approach to sample from a high dimensional distribution $e^{-f}$ for some function $f$ is the Langevin Algorithm (LA). Recently, there has been a lot of progress in showing fast convergence of LA even in cases where $f$ is non-...More

Code:

Data:

Introduction

- The authors focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure.
- The classic Riemannian Langevin algorithm, e.g.
- Beyond the classic application of Riemannian Langevin Algorithm (RLA), recent progress in [12, 39] shows that sampling from a distribution on a manifold has application in matrix factorization, principal component analysis, matrix completion, mean field and continuous games and GANs. Formally, a game with finite number of agents is called continuous if the strategy spaces are continuous, either a finite dimensional differential manifold or an infinite dimensional Banach manifold [43, 44, 12].
- The mixed strategy is a probability distribution on the strategy manifold and mixed Nash equilibria can be approximated by Langevin dynamics

Highlights

- We focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure
- We propose a Geodesic Langevin Algorithm (GLA) as a natural generalization of unadjusted Langevin algorithm (ULA) from the Euclidean space to manifold M
- The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric
- In this paper we focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm
- By leveraging the geometric meaning of GLA, we provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA
- By assuming that we have full access to the geometric data of the manifold, we can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize

Results

- The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric.
- The stationary solution of equation (3) is e−f(x) that minimizes the entropy regularized functional L(ρ), and the optimization problem over the space of densities boils down to track the evolution of ρ(x, t) that is defined by equation (3)

Conclusion

- In this paper the authors focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm.
- GLA modifies the Riemannian Langevin algorithm by using exponential map so that the algorithm is defined globally.
- By leveraging the geometric meaning of GLA, the authors provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA.
- By assuming that the authors have full access to the geometric data of the manifold, the authors can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize

Summary

## Introduction:

The authors focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure.- The classic Riemannian Langevin algorithm, e.g.
- Beyond the classic application of Riemannian Langevin Algorithm (RLA), recent progress in [12, 39] shows that sampling from a distribution on a manifold has application in matrix factorization, principal component analysis, matrix completion, mean field and continuous games and GANs. Formally, a game with finite number of agents is called continuous if the strategy spaces are continuous, either a finite dimensional differential manifold or an infinite dimensional Banach manifold [43, 44, 12].
- The mixed strategy is a probability distribution on the strategy manifold and mixed Nash equilibria can be approximated by Langevin dynamics
## Results:

The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric.- The stationary solution of equation (3) is e−f(x) that minimizes the entropy regularized functional L(ρ), and the optimization problem over the space of densities boils down to track the evolution of ρ(x, t) that is defined by equation (3)
## Conclusion:

In this paper the authors focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm.- GLA modifies the Riemannian Langevin algorithm by using exponential map so that the algorithm is defined globally.
- By leveraging the geometric meaning of GLA, the authors provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA.
- By assuming that the authors have full access to the geometric data of the manifold, the authors can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize

Related work

- Unadjusted Langevin algorithm (ULA) when sampling from a strongly logconcave density in Euclidean space has been studied extensively in the literature. The bounds for ULA is known in [7, 9, 11, 13]. The case when f is strongly convex and has Lipschitz gradient is studied by [10, 14, 16]. Since ULA is biased because of the discretization, i.e. it converges to a limit distribution that is different from that from continuous Langevin equation. the Metropolis-Hastings correction is widely used to correct this bias, e.g. [46, 17]. A simplified correction algorithm is proposed by [59] that is called symmetrized Langevin algorithm with a smaller bias than ULA. Convergence results is obtained for Proximal Langevin algorithm (PLA) in [60]. In the case where the target distribution is log-concave, there are other algorithms proven to converge rapidly, i.e., Langevin Monte Carlo by [3], ball walk and hit-and-run [27, 32, 34, 33], and Hamiltonian Monte Carlo by [15, 55, 38]. The underdamped version of the Langevin dynamics under log-Sobolev inequality is studied by [36], where an iteration complexity for the discrete time algorithm that has better dependence on the dimension is provided. A coupling approach is used by [18] to quantify convergence to equilibrium for Langevin dynamics that yields contractions in a particular Wasserstein distance and provides precise bounds for convergence to equilibrium. The case where the densities that are neither smooth nor log-concave is studied in [35] and asymptotic consistency guarantees is provided. For the Wasserstein distance, [8, 37, 42] provide convergence bound. An earlier research on stochastic gradient Langevin dynamics with application in Bayesian learning is proposed by [58], The Langevin Monte Carlo with a weaker smoothness assumption is studied by [6]. In order to improve sample quality, [21] develops a theory of weak convergence for kernel Stein discrepancy based on Stein’s method. In general, sampling from non log-concave densities is hard, [19] gives an exponential lower bound on the number of queries required.

Reference

- Naman Agarwal, Nicolas Boumal, Brian Bullins, and Coralia Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming, 2020.
- G.G. Batrouni, H. Kawai, and Pietro Rossi. Coordinate-independent formulation of the langevin equation. Journal of Mathematical Physics, 27, 1986.
- Espen Bernton. Langevin monte carlo and jko splitting. In COLT, 2018.
- Marcus Brubaker, Mathieu Salzmann, and Raquel Urtasun. A family of mcmc methods on implicitly defined manifolds. In AISTATS, 2012.
- Simon Byrne and Mark Girolami. Geodesic monte carlo on embedded manifolds. Scandinavian Journal of Statistics, Theory and Applications, 40(4), 2013.
- NS. Chatterji, J. Diakonikolas, MI. Jordan, and Peter Bartlett. Langevin monte carlo without smoothness. In arXiv:1905.13285, 2019.
- Xiang Cheng and Peter Bartlett. Convergence of langevin mcmc in kl-divergence. Proceedings of Machine Learning Research, 83, 2018.
- Xiang Cheng, Niladri S Chatterji, Yasin Abbasi-Yadkori, Peter Bartlett, and Michael I Jordan. Sharp convergence rates for langevin dynamics in the nonconvex setting. arXiv:1805.01648, 2018.
- Arnak Dalalyan. Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. Proceedings of the Conference on Learning Theory, 2017.
- Arnak Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 2017.
- Arnak Dalalyan and Avetik Karagulyan. User-friendly guarantees for the langevin monte carlo with inaccurate gradient. Stochastic Processes and their Applications, 2019.
- Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grand Rotskoff, and Joan Bruna. A mean-field analysis of two-player zero-sum games. https://arxiv.org/abs/2002.06277, 2020.
- Alain Durmus, Szymon Majewski, and Blazej Miasojedow. Analysis of langevin monte carlo via convex optimization. In arXiv:1802.09188, 2018.
- Alain Durmus and Eric Moulines. Nonasymptotic convergence analysis for the unadjusted langevin algorithm. The Annals of Applied Probability, 27(3), 2017.
- Alain Durmus, Eric Moulines, and Eero Saksman. On the convergence of hamiltonian monte carlo. In arXiv:1705.00166, 2017.
- Alain Durmus and Eric Mounline. High-dimensional bayesian inference via the unadjusted langevin algorithm. Bernoulli, 25(4A), 2019.
- Raaz Dwivedi, Yuansi Chen, Martin Wainwright, and Bin Yu. Log-concave sampling: Metropolishastings algorithms are fast! Proceedings of the Conference of Learning Theory, 2018.
- Andreas Eberle, Arnaud Guillin, and Raphael Zimmer. Coupling and quantitative contraction rates for langevin dynamics. In arXiv:1703.01617, 2018.
- Rong Ge, Holden Lee, and Andrej Risteski. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering langevin monte carlo. In NeurIPS, 2018.
- Mark Girolami and Ben Calderhead. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 2011.
- Jackson Gorham and Lester Mackey. Measuring sample quality with kernels. In ICML, 2017.
- Leonard Gross. Logarithmic sobolev inequalities and contractivity properties of semigroups. Lecture Notes in Maths, 1563, 1993.
- Ya-Ping Hsieh, Ali Kavis, Paul Rolland, and Volkan Cevher. Mirrored langevin dynamics. In NeurIPS, 2018.
- Elton P. Hsu. Stochastic Analysis on Manifolds. American Mathematical Society, 2002.
- Alfredo Garbuno Inigo, Nikolas Nusken, and Sebastian Reich. Affine invariant interacting langevin dynamics for bayesian. In arXiv:1912.02859, 2019.
- Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker-planck equation. SIAM Journal on Mathematical Analysis, 29(1), 1998.
- R. Kannan, L. Lovasz, and M. Simonovits. Random walks and an o∗(n5) volume algorithm for convex bodies. Random Structures and Algorithms, 11, 1997.
- Michel Ledoux. Concentration of measure and logarithmic sobolev inequalities. Seminaire de probabilites, 33, 1999.
- John Lee. Introduction to Riemannian Manifolds, volume 176 GTM. Springer, 2018.
- Chang Liu, Jun Zhu, and Yang Song. Stochastic gradient geodesic mcmc methods. In NIPS, 2016.
- Chang Liu, Jingwei Zhuo, and Jun Zhu. Understanding mcmc dynamics as flows on the wasserstein space. In ICML, 2019.
- L. Lovasz and S. Vempala. Fast algorithm for logconcave functions: sampling, rounding, integration and optimization. In FOCS, 2006.
- L. Lovasz and S. Vempala. Hit-and-run from a corner. SIAM Journal on Computing, 35(4), 2006.
- L. Lovasz and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures and Algorithms, 30(3), 2007.
- Tung Duy Luu, Jalal Fadili, and Christophe Chesneau. Sampling from non-smooth distribution through langevin diffution. URL https://hal.archives-ouvertes.fr/hal-01492056, 2017.
- Yi-An Ma, Niladri Chatterji, Xiang Cheng, Nicolas Flammarion, Peter Bartlett, and Michael I Jordan. Is there an analog of nesterov acceleration for mcmc? In arXiv preprint arXiv: 1902.00996, 2019.
- Mateusz Majka, Aleksandar Mijatovic, and Lukasz Szpruch. Non-asymptotic bounds for sampling algorithms without logconcavity. arXiv:1808.07105, 2018.
- Oren Mangoubi and Nisheeth Vishnoi. Dimensionally tight bounds for second-order hamiltonian monte carlo. In NeurIPS, 2018.
- Ankur Moitra and Andrej Risteski. Fast convergence for langevin diffusion with matrix manifold structure. In arXiv:2002.05576, 2020.
- Felix Otto and Cedric Villani. Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. Journal of Functional Analysis, 173:361–400, 2000.
- Sam Patterson and Yee Whye Teh. Stochastic gradient riemannian langevin dynamics on the probability simplex. In NIPS, 2013.
- Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In COLT, 2017.
- Lillian Ratliff, Samuel Burden, and S. Shankar Sastry. Characterization and computation of local nash equilibria in continuous games. In Fifty-first Annual Allerton Conference, 2013.
- Lillian Ratliff, Samuel Burden, and S. Shankar Sastry. On the characterization of local nash equilibria in continuous games. IEEE Transactions on Automatic Control, 61(8), 2016.
- Gareth Roberts and Osnat Stramer. Langevin diffusions and metropolis-hastings algorithms. Methodology and computing in applied probability, 4(4), 2002.
- Gareth Roberts and Richard Tweedie. Exponential convergence of langevin distributions and their discrete approximation. Bernoulli, 2(4), 1996.
- O Rothaus. Diffusion on compact riemannian manifolds and logarithmic sobolev inequalities. Journal of Functional Analysis, 42, 1981.
- O Rothaus. Hypercontractivity and the bakry-emery criterion. Journal of Functional Analysis, 65, 1986.
- Filippo Santanmbrogio. Optimal Transport for Applied Mathematicians. Birkhauser, 2015.
- Christof Seiler, Simon Rubinstein-Salzedo, and Susan Holmes. Positive curvature and hamiltonian monte carlo. In NIPS, 2014.
- Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, and Jascha Sohl-Dickstein. Stochastic natural gradient descent draws posterior samples in function space. In NeurIPS, 2018.
- M Talagrand. Transportation cost for gaussian and other product measures. Geometric and Functional Analysis, 6, 1996.
- Santosh Vempala and Andre Wibisono. Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. In NeurIPS, 2019.
- Santosh S. Vempala and Yin-Tat Lee. Geodesic walks in polytopes. In STOC, 2017.
- Santosh S. Vempala and Yin-Tat Lee. Convergence rate of riemannian hamiltonian monte carlo and faster polytope volume computation. In STOC, 2018.
- Feng-Yu Wang. Logarithmic sobolev inequalities on noncompact riemannian manifolds. Probability Theory and Related Fields, 109:417–424, 1997.
- Feng-Yu Wang. On estimation of the logarithmic sobolev constant and gradient estimates of heat semigroups. Probability Theory and Related Fields, 108:87–101, 1997.
- Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
- Andre Wibisono. Sampling as optimization in the space of measures: The langevin dynamics as a composite optimization problem. In Conference on Learning Theory, 2018.
- Andre Wibisono. Proximal langevin algorithm: Rapid convergence under isoperimetry. In arXiv:1911.01469, 2019.
- Kelvin Shuangjian Zhang, Gabriel Peyre, Jalal Fadili, and Marcelo Pereyra. Wasserstein control of mirror langevin monte carlo. arXiv:2002.04363, 2020.
- 1. The closed and bounded subsets of M are compact.
- 0. Integrating for 0 ≤ t ≤ s, the result holds as e2αsH(ρs|ν) − H(ρ0|ν) =
- 0. Rearranging and renaming s by t, we conclude
- 0. By the assumption
- 2. The differential of Retrx at 0 is the identity map.

Tags

Comments