AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
In this paper we focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm

Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev

NIPS 2020, (2020)

Cited by: 0|Views25
EI
Full Text
Bibtex
Weibo

Abstract

Sampling is a fundamental and arguably very important task with numerous applications in Machine Learning. One approach to sample from a high dimensional distribution $e^{-f}$ for some function $f$ is the Langevin Algorithm (LA). Recently, there has been a lot of progress in showing fast convergence of LA even in cases where $f$ is non-...More

Code:

Data:

0
Introduction
  • The authors focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure.
  • The classic Riemannian Langevin algorithm, e.g.
  • Beyond the classic application of Riemannian Langevin Algorithm (RLA), recent progress in [12, 39] shows that sampling from a distribution on a manifold has application in matrix factorization, principal component analysis, matrix completion, mean field and continuous games and GANs. Formally, a game with finite number of agents is called continuous if the strategy spaces are continuous, either a finite dimensional differential manifold or an infinite dimensional Banach manifold [43, 44, 12].
  • The mixed strategy is a probability distribution on the strategy manifold and mixed Nash equilibria can be approximated by Langevin dynamics
Highlights
  • We focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure
  • We propose a Geodesic Langevin Algorithm (GLA) as a natural generalization of unadjusted Langevin algorithm (ULA) from the Euclidean space to manifold M
  • The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric
  • In this paper we focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm
  • By leveraging the geometric meaning of GLA, we provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA
  • By assuming that we have full access to the geometric data of the manifold, we can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize
Results
  • The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric.
  • The stationary solution of equation (3) is e−f(x) that minimizes the entropy regularized functional L(ρ), and the optimization problem over the space of densities boils down to track the evolution of ρ(x, t) that is defined by equation (3)
Conclusion
  • In this paper the authors focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm.
  • GLA modifies the Riemannian Langevin algorithm by using exponential map so that the algorithm is defined globally.
  • By leveraging the geometric meaning of GLA, the authors provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA.
  • By assuming that the authors have full access to the geometric data of the manifold, the authors can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize
Summary
  • Introduction:

    The authors focus on the problem of sampling from a distribution e−f(x) supported on a Riemannian manifold M with standard volume measure.
  • The classic Riemannian Langevin algorithm, e.g.
  • Beyond the classic application of Riemannian Langevin Algorithm (RLA), recent progress in [12, 39] shows that sampling from a distribution on a manifold has application in matrix factorization, principal component analysis, matrix completion, mean field and continuous games and GANs. Formally, a game with finite number of agents is called continuous if the strategy spaces are continuous, either a finite dimensional differential manifold or an infinite dimensional Banach manifold [43, 44, 12].
  • The mixed strategy is a probability distribution on the strategy manifold and mixed Nash equilibria can be approximated by Langevin dynamics
  • Results:

    The equivalence between Langevin dynamics and optimization in the space of densities is based on the result of [26, 59] that the Langevin dynamics captures the gradient flow of the relative entropy functional in the space of densities with the Wasserstein metric.
  • The stationary solution of equation (3) is e−f(x) that minimizes the entropy regularized functional L(ρ), and the optimization problem over the space of densities boils down to track the evolution of ρ(x, t) that is defined by equation (3)
  • Conclusion:

    In this paper the authors focus on the problem of sampling from a distribution on a Riemannian manifold and propose the Geodesic Langevin Algorithm.
  • GLA modifies the Riemannian Langevin algorithm by using exponential map so that the algorithm is defined globally.
  • By leveraging the geometric meaning of GLA, the authors provide a non-asymptotic convergence guarantee in the sense that the KL divergence decreases fast along the iterations of GLA.
  • By assuming that the authors have full access to the geometric data of the manifold, the authors can control the bias between the stationary distribution of GLA and the target distribution to be arbitrarily small through the choice of stepsize
Related work
  • Unadjusted Langevin algorithm (ULA) when sampling from a strongly logconcave density in Euclidean space has been studied extensively in the literature. The bounds for ULA is known in [7, 9, 11, 13]. The case when f is strongly convex and has Lipschitz gradient is studied by [10, 14, 16]. Since ULA is biased because of the discretization, i.e. it converges to a limit distribution that is different from that from continuous Langevin equation. the Metropolis-Hastings correction is widely used to correct this bias, e.g. [46, 17]. A simplified correction algorithm is proposed by [59] that is called symmetrized Langevin algorithm with a smaller bias than ULA. Convergence results is obtained for Proximal Langevin algorithm (PLA) in [60]. In the case where the target distribution is log-concave, there are other algorithms proven to converge rapidly, i.e., Langevin Monte Carlo by [3], ball walk and hit-and-run [27, 32, 34, 33], and Hamiltonian Monte Carlo by [15, 55, 38]. The underdamped version of the Langevin dynamics under log-Sobolev inequality is studied by [36], where an iteration complexity for the discrete time algorithm that has better dependence on the dimension is provided. A coupling approach is used by [18] to quantify convergence to equilibrium for Langevin dynamics that yields contractions in a particular Wasserstein distance and provides precise bounds for convergence to equilibrium. The case where the densities that are neither smooth nor log-concave is studied in [35] and asymptotic consistency guarantees is provided. For the Wasserstein distance, [8, 37, 42] provide convergence bound. An earlier research on stochastic gradient Langevin dynamics with application in Bayesian learning is proposed by [58], The Langevin Monte Carlo with a weaker smoothness assumption is studied by [6]. In order to improve sample quality, [21] develops a theory of weak convergence for kernel Stein discrepancy based on Stein’s method. In general, sampling from non log-concave densities is hard, [19] gives an exponential lower bound on the number of queries required.
Reference
  • Naman Agarwal, Nicolas Boumal, Brian Bullins, and Coralia Cartis. Adaptive regularization with cubics on manifolds. Mathematical Programming, 2020.
    Google ScholarLocate open access versionFindings
  • G.G. Batrouni, H. Kawai, and Pietro Rossi. Coordinate-independent formulation of the langevin equation. Journal of Mathematical Physics, 27, 1986.
    Google ScholarLocate open access versionFindings
  • Espen Bernton. Langevin monte carlo and jko splitting. In COLT, 2018.
    Google ScholarLocate open access versionFindings
  • Marcus Brubaker, Mathieu Salzmann, and Raquel Urtasun. A family of mcmc methods on implicitly defined manifolds. In AISTATS, 2012.
    Google ScholarLocate open access versionFindings
  • Simon Byrne and Mark Girolami. Geodesic monte carlo on embedded manifolds. Scandinavian Journal of Statistics, Theory and Applications, 40(4), 2013.
    Google ScholarLocate open access versionFindings
  • NS. Chatterji, J. Diakonikolas, MI. Jordan, and Peter Bartlett. Langevin monte carlo without smoothness. In arXiv:1905.13285, 2019.
    Findings
  • Xiang Cheng and Peter Bartlett. Convergence of langevin mcmc in kl-divergence. Proceedings of Machine Learning Research, 83, 2018.
    Google ScholarLocate open access versionFindings
  • Xiang Cheng, Niladri S Chatterji, Yasin Abbasi-Yadkori, Peter Bartlett, and Michael I Jordan. Sharp convergence rates for langevin dynamics in the nonconvex setting. arXiv:1805.01648, 2018.
    Findings
  • Arnak Dalalyan. Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. Proceedings of the Conference on Learning Theory, 2017.
    Google ScholarLocate open access versionFindings
  • Arnak Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 2017.
    Google ScholarLocate open access versionFindings
  • Arnak Dalalyan and Avetik Karagulyan. User-friendly guarantees for the langevin monte carlo with inaccurate gradient. Stochastic Processes and their Applications, 2019.
    Google ScholarLocate open access versionFindings
  • Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grand Rotskoff, and Joan Bruna. A mean-field analysis of two-player zero-sum games. https://arxiv.org/abs/2002.06277, 2020.
    Findings
  • Alain Durmus, Szymon Majewski, and Blazej Miasojedow. Analysis of langevin monte carlo via convex optimization. In arXiv:1802.09188, 2018.
    Findings
  • Alain Durmus and Eric Moulines. Nonasymptotic convergence analysis for the unadjusted langevin algorithm. The Annals of Applied Probability, 27(3), 2017.
    Google ScholarLocate open access versionFindings
  • Alain Durmus, Eric Moulines, and Eero Saksman. On the convergence of hamiltonian monte carlo. In arXiv:1705.00166, 2017.
    Findings
  • Alain Durmus and Eric Mounline. High-dimensional bayesian inference via the unadjusted langevin algorithm. Bernoulli, 25(4A), 2019.
    Google ScholarLocate open access versionFindings
  • Raaz Dwivedi, Yuansi Chen, Martin Wainwright, and Bin Yu. Log-concave sampling: Metropolishastings algorithms are fast! Proceedings of the Conference of Learning Theory, 2018.
    Google ScholarLocate open access versionFindings
  • Andreas Eberle, Arnaud Guillin, and Raphael Zimmer. Coupling and quantitative contraction rates for langevin dynamics. In arXiv:1703.01617, 2018.
    Findings
  • Rong Ge, Holden Lee, and Andrej Risteski. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering langevin monte carlo. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Mark Girolami and Ben Calderhead. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 2011.
    Google ScholarLocate open access versionFindings
  • Jackson Gorham and Lester Mackey. Measuring sample quality with kernels. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Leonard Gross. Logarithmic sobolev inequalities and contractivity properties of semigroups. Lecture Notes in Maths, 1563, 1993.
    Google ScholarLocate open access versionFindings
  • Ya-Ping Hsieh, Ali Kavis, Paul Rolland, and Volkan Cevher. Mirrored langevin dynamics. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Elton P. Hsu. Stochastic Analysis on Manifolds. American Mathematical Society, 2002.
    Google ScholarLocate open access versionFindings
  • Alfredo Garbuno Inigo, Nikolas Nusken, and Sebastian Reich. Affine invariant interacting langevin dynamics for bayesian. In arXiv:1912.02859, 2019.
    Findings
  • Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker-planck equation. SIAM Journal on Mathematical Analysis, 29(1), 1998.
    Google ScholarLocate open access versionFindings
  • R. Kannan, L. Lovasz, and M. Simonovits. Random walks and an o∗(n5) volume algorithm for convex bodies. Random Structures and Algorithms, 11, 1997.
    Google ScholarLocate open access versionFindings
  • Michel Ledoux. Concentration of measure and logarithmic sobolev inequalities. Seminaire de probabilites, 33, 1999.
    Google ScholarLocate open access versionFindings
  • John Lee. Introduction to Riemannian Manifolds, volume 176 GTM. Springer, 2018.
    Google ScholarFindings
  • Chang Liu, Jun Zhu, and Yang Song. Stochastic gradient geodesic mcmc methods. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Chang Liu, Jingwei Zhuo, and Jun Zhu. Understanding mcmc dynamics as flows on the wasserstein space. In ICML, 2019.
    Google ScholarLocate open access versionFindings
  • L. Lovasz and S. Vempala. Fast algorithm for logconcave functions: sampling, rounding, integration and optimization. In FOCS, 2006.
    Google ScholarLocate open access versionFindings
  • L. Lovasz and S. Vempala. Hit-and-run from a corner. SIAM Journal on Computing, 35(4), 2006.
    Google ScholarLocate open access versionFindings
  • L. Lovasz and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures and Algorithms, 30(3), 2007.
    Google ScholarLocate open access versionFindings
  • Tung Duy Luu, Jalal Fadili, and Christophe Chesneau. Sampling from non-smooth distribution through langevin diffution. URL https://hal.archives-ouvertes.fr/hal-01492056, 2017.
    Findings
  • Yi-An Ma, Niladri Chatterji, Xiang Cheng, Nicolas Flammarion, Peter Bartlett, and Michael I Jordan. Is there an analog of nesterov acceleration for mcmc? In arXiv preprint arXiv: 1902.00996, 2019.
    Findings
  • Mateusz Majka, Aleksandar Mijatovic, and Lukasz Szpruch. Non-asymptotic bounds for sampling algorithms without logconcavity. arXiv:1808.07105, 2018.
    Findings
  • Oren Mangoubi and Nisheeth Vishnoi. Dimensionally tight bounds for second-order hamiltonian monte carlo. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • Ankur Moitra and Andrej Risteski. Fast convergence for langevin diffusion with matrix manifold structure. In arXiv:2002.05576, 2020.
    Findings
  • Felix Otto and Cedric Villani. Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. Journal of Functional Analysis, 173:361–400, 2000.
    Google ScholarLocate open access versionFindings
  • Sam Patterson and Yee Whye Teh. Stochastic gradient riemannian langevin dynamics on the probability simplex. In NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In COLT, 2017.
    Google ScholarLocate open access versionFindings
  • Lillian Ratliff, Samuel Burden, and S. Shankar Sastry. Characterization and computation of local nash equilibria in continuous games. In Fifty-first Annual Allerton Conference, 2013.
    Google ScholarLocate open access versionFindings
  • Lillian Ratliff, Samuel Burden, and S. Shankar Sastry. On the characterization of local nash equilibria in continuous games. IEEE Transactions on Automatic Control, 61(8), 2016.
    Google ScholarLocate open access versionFindings
  • Gareth Roberts and Osnat Stramer. Langevin diffusions and metropolis-hastings algorithms. Methodology and computing in applied probability, 4(4), 2002.
    Google ScholarLocate open access versionFindings
  • Gareth Roberts and Richard Tweedie. Exponential convergence of langevin distributions and their discrete approximation. Bernoulli, 2(4), 1996.
    Google ScholarLocate open access versionFindings
  • O Rothaus. Diffusion on compact riemannian manifolds and logarithmic sobolev inequalities. Journal of Functional Analysis, 42, 1981.
    Google ScholarLocate open access versionFindings
  • O Rothaus. Hypercontractivity and the bakry-emery criterion. Journal of Functional Analysis, 65, 1986.
    Google ScholarLocate open access versionFindings
  • Filippo Santanmbrogio. Optimal Transport for Applied Mathematicians. Birkhauser, 2015.
    Google ScholarFindings
  • Christof Seiler, Simon Rubinstein-Salzedo, and Susan Holmes. Positive curvature and hamiltonian monte carlo. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, and Jascha Sohl-Dickstein. Stochastic natural gradient descent draws posterior samples in function space. In NeurIPS, 2018.
    Google ScholarLocate open access versionFindings
  • M Talagrand. Transportation cost for gaussian and other product measures. Geometric and Functional Analysis, 6, 1996.
    Google ScholarLocate open access versionFindings
  • Santosh Vempala and Andre Wibisono. Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. In NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Santosh S. Vempala and Yin-Tat Lee. Geodesic walks in polytopes. In STOC, 2017.
    Google ScholarLocate open access versionFindings
  • Santosh S. Vempala and Yin-Tat Lee. Convergence rate of riemannian hamiltonian monte carlo and faster polytope volume computation. In STOC, 2018.
    Google ScholarLocate open access versionFindings
  • Feng-Yu Wang. Logarithmic sobolev inequalities on noncompact riemannian manifolds. Probability Theory and Related Fields, 109:417–424, 1997.
    Google ScholarLocate open access versionFindings
  • Feng-Yu Wang. On estimation of the logarithmic sobolev constant and gradient estimates of heat semigroups. Probability Theory and Related Fields, 108:87–101, 1997.
    Google ScholarLocate open access versionFindings
  • Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
    Google ScholarLocate open access versionFindings
  • Andre Wibisono. Sampling as optimization in the space of measures: The langevin dynamics as a composite optimization problem. In Conference on Learning Theory, 2018.
    Google ScholarLocate open access versionFindings
  • Andre Wibisono. Proximal langevin algorithm: Rapid convergence under isoperimetry. In arXiv:1911.01469, 2019.
    Findings
  • Kelvin Shuangjian Zhang, Gabriel Peyre, Jalal Fadili, and Marcelo Pereyra. Wasserstein control of mirror langevin monte carlo. arXiv:2002.04363, 2020.
    Findings
  • 1. The closed and bounded subsets of M are compact.
    Google ScholarFindings
  • 0. Integrating for 0 ≤ t ≤ s, the result holds as e2αsH(ρs|ν) − H(ρ0|ν) =
    Google ScholarFindings
  • 0. Rearranging and renaming s by t, we conclude
    Google ScholarFindings
  • 0. By the assumption
    Google ScholarFindings
  • 2. The differential of Retrx at 0 is the identity map.
    Google ScholarFindings
Author
Xiao Wang
Xiao Wang
Qi Lei
Qi Lei
Your rating :
0

 

Tags
Comments
小科