AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We provided a convergence analysis of continuous-time mirror descent applied to sparse phase retrieval

A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval

NIPS 2020, (2020)

Cited by: 0|Views10
EI
Full Text
Bibtex
Weibo

Abstract

We analyze continuous-time mirror descent applied to sparse phase retrieval, which is the problem of recovering sparse signals from a set of magnitude-only measurements. We apply mirror descent to the unconstrained empirical risk minimization problem (batch setting), using the square loss and square measurements. We provide a convergenc...More
0
Introduction
  • Mirror descent [39] is becoming increasingly popular in a variety of settings in optimization and machine learning.
  • [20, 18, 23, 24, 32, 33, 37, 61, 63, 64], and the authors contribute to this literature by analyzing continuous-time mirror descent in the non-convex problem of sparse phase retrieval.
  • Theorem 2, the initial Bregman divergence linear convergence stage corresponds to variables Xi(t) on the support being fitted; to establish linear convergence, the authors crucially use the bound (8) of Lemma 1 along with the fact that the second term XSc(t) 1 is negligibly small compared to XS (t) − xS 22.
Highlights
  • Mirror descent [39] is becoming increasingly popular in a variety of settings in optimization and machine learning
  • While the variational coherence property as defined in [63, 64] precludes the existence of saddle points and is not satisfied in the sparse phase retrieval problem, we show that the defining inequality is satisfied along the trajectory of mirror descent, which is what allows us to establish the convergence analysis
  • We provided a convergence analysis of continuous-time mirror descent applied to sparse phase retrieval
  • We proved that, equipped with the h√ypentropy mirror map, mirror descent recovers any k-sparse signal x ∈ Rn with xmin = Ω(1/ k) from O(k2) Gaussian measurements
  • As Hadamard Wirtinger flow (HWF) can be recovered as a discrete-time first-order approximation to the mirror descent algorithm we analyzed, our results provide a principled theoretical understanding of HWF
  • Our continuous-time analysis suggests how the initialization size in HWF affects convergence, and that choosing the initialization size sufficiently small can result in far fewer iterations being necessary to reach any given precision > 0
Results
  • Scaling with signal magnitude When analyzing the convergence speed of continuous-time mirror descent equipped with the hypentropy mirror map for sparse phase retrieval, t x
  • When considering the algorithm in discrete time, this suggests that the step size should scale like x similar observations has been made in the case of gradient descent for phase retrieval [36], where the step size scales as x
  • Similar to the discrete case [25], a brief computation shows that the exponentiated gradient algorithm EG± (17) with initialization (18) is equivalent to mirror descent (15) with initialization
  • Theorem 2 implies that the precision up to which convergence is linear is controlled by the mirror map parameter β or, equivalently, by the initialization size in HWF.
  • The authors provided a convergence analysis of continuous-time mirror descent applied to sparse phase retrieval.
  • The authors' continuous-time analysis suggests how the initialization size in HWF affects convergence, and that choosing the initialization size sufficiently small can result in far fewer iterations being necessary to reach any given precision > 0.
  • Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion.
  • In Appendix A, the authors show the equivalence between continuous-time mirror descent equipped with the hypentropy mirror map and the exponentiated gradient algorithm described in Section 5.
  • The authors provide three supporting lemmas characterizing the behavior of mirror descent, which will be useful to prove Theorem 2.
  • Guided by the analysis of the population dynamics and the fact that Lemma 1 plays a central role in bounding (29) in terms of the Bregman divergence DΦ(x , X(t)), the authors divide the analysis of the convergence of mirror descent into two stages, bounded by
Conclusion
  • The first sum is bounded by Lemma 10: recalling m ≥ c1(γ)k2 log2 n, the authors have with probability at least 1 − c4n−13, max l∈/S
  • The other terms can all be bounded as follows: 4 m m (ATj,S xS )3(ATj,Sc xSc )
  • The authors leave a full theoretical investigation of HWF, with a proper discussion on step-size tuning, for future work
Summary
  • Mirror descent [39] is becoming increasingly popular in a variety of settings in optimization and machine learning.
  • [20, 18, 23, 24, 32, 33, 37, 61, 63, 64], and the authors contribute to this literature by analyzing continuous-time mirror descent in the non-convex problem of sparse phase retrieval.
  • Theorem 2, the initial Bregman divergence linear convergence stage corresponds to variables Xi(t) on the support being fitted; to establish linear convergence, the authors crucially use the bound (8) of Lemma 1 along with the fact that the second term XSc(t) 1 is negligibly small compared to XS (t) − xS 22.
  • Scaling with signal magnitude When analyzing the convergence speed of continuous-time mirror descent equipped with the hypentropy mirror map for sparse phase retrieval, t x
  • When considering the algorithm in discrete time, this suggests that the step size should scale like x similar observations has been made in the case of gradient descent for phase retrieval [36], where the step size scales as x
  • Similar to the discrete case [25], a brief computation shows that the exponentiated gradient algorithm EG± (17) with initialization (18) is equivalent to mirror descent (15) with initialization
  • Theorem 2 implies that the precision up to which convergence is linear is controlled by the mirror map parameter β or, equivalently, by the initialization size in HWF.
  • The authors provided a convergence analysis of continuous-time mirror descent applied to sparse phase retrieval.
  • The authors' continuous-time analysis suggests how the initialization size in HWF affects convergence, and that choosing the initialization size sufficiently small can result in far fewer iterations being necessary to reach any given precision > 0.
  • Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion.
  • In Appendix A, the authors show the equivalence between continuous-time mirror descent equipped with the hypentropy mirror map and the exponentiated gradient algorithm described in Section 5.
  • The authors provide three supporting lemmas characterizing the behavior of mirror descent, which will be useful to prove Theorem 2.
  • Guided by the analysis of the population dynamics and the fact that Lemma 1 plays a central role in bounding (29) in terms of the Bregman divergence DΦ(x , X(t)), the authors divide the analysis of the convergence of mirror descent into two stages, bounded by
  • The first sum is bounded by Lemma 10: recalling m ≥ c1(γ)k2 log2 n, the authors have with probability at least 1 − c4n−13, max l∈/S
  • The other terms can all be bounded as follows: 4 m m (ATj,S xS )3(ATj,Sc xSc )
  • The authors leave a full theoretical investigation of HWF, with a proper discussion on step-size tuning, for future work
Funding
  • Fan Wu is supported by the EPSRC and MRC through the OxWaSP CDT programme (EP/L016710/1)
Reference
  • A. Ali, J. Z. Kolter, and R. J. Tibshirani. A continuous-time view of early stopping for least squares. In International Conference on Artificial Intelligence and Statistics, pages 1370–1378, 2019.
    Google ScholarLocate open access versionFindings
  • A. Ali, E. Dobriban, and R. J. Tibshirani. The implicit regularization of stochastic gradient flow for least squares. arXiv preprint arXiv:2003.07802, 2020.
    Findings
  • E. Amid and M. K. Warmuth. Interpolating between gradient descent and exponentiated gradient using reparametrized gradient descent. arXiv preprint arXiv:2002.10487, 2020.
    Findings
  • S. Arora, N. Cohen, W. Hu, and Y. Luo. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems, pages 7411–7422, 2019.
    Google ScholarLocate open access versionFindings
  • J.-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11(94):2785–2836, 2010.
    Google ScholarLocate open access versionFindings
  • J.-Y. Audibert, S. Bubeck, and G. Lugosi. Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1):31–45, 2013.
    Google ScholarLocate open access versionFindings
  • A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
    Google ScholarLocate open access versionFindings
  • S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, 2004.
    Google ScholarFindings
  • S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8:231–358, 2015.
    Google ScholarLocate open access versionFindings
  • O. Bunk, A. Diaz, F. Pfeiffer, C. David, B. Schmitt, D. K. Satapathy, and J. F. Veen. Diffractive imaging for periodic samples: Retrieving one-dimensional concenctration profiles across microfluidic channels. Acta Crystallographica Section A: Foundations of Crystallography, 63(4):306–314, 2007.
    Google ScholarLocate open access versionFindings
  • T. Cai, X. Li, and Z. Ma. Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Annals of Statistics, 44(5):2221–2251, 2016.
    Google ScholarLocate open access versionFindings
  • E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
    Google ScholarLocate open access versionFindings
  • G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Mathematical Analysis, 3(3):538–543, 1993.
    Google ScholarLocate open access versionFindings
  • Y. Chen and E. J. Candès. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In Advances in Neural Information Processing Systems, pages 739–747, 2015.
    Google ScholarLocate open access versionFindings
  • L. Chizat and F. Bach. On the global convergence of gradient descent for over-parametrized models using optimal transport. In Advances in Neural Information Processing Systems, pages 3036–3046, 2018.
    Google ScholarLocate open access versionFindings
  • F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Math., 3(1):79–127, 2006.
    Google ScholarLocate open access versionFindings
  • J. V. Corbett. The pauli problem, state reconstruction and quantum real numbers. Reports on Mathematical Physics, 57(1):53–68, 2006.
    Google ScholarLocate open access versionFindings
  • C. D. Dang and G. Lan. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM Journal on Optimization, 25(2):856–881, 2015. on weakly convex functions. arXiv preprint arXiv:1802.02988, 2018.
    Findings
  • [20] D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM Journal on Optimization, 29(3):1908–1930, 2019.
    Google ScholarLocate open access versionFindings
  • [21] J. R. Fienup. Phase retrieval algorithms: A comparison. Applied Optics, 21(15):2758–2769, 1982.
    Google ScholarLocate open access versionFindings
  • [22] D. J. Fresen. Variations and extensions of the gaussian concentration inequality. arXiv preprint arXiv:1812.10938, 2018.
    Findings
  • [23] S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
    Google ScholarLocate open access versionFindings
  • [24] S. Ghadimi, G. Lan, and H. Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1–2):267– 305, 2016.
    Google ScholarLocate open access versionFindings
  • [25] U. Ghai, E. Hazan, and Y. Singer. Exponentiated gradient meets gradient descent. In International Conference on Algorithmic Learning Theory, pages 386–407, 2020.
    Google ScholarLocate open access versionFindings
  • [26] S. Gunasekar, B. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro. Implicit regularization in matrix factorization. In Advances in Neural Information Processing Systems, pages 6151–6159, 2017.
    Google ScholarLocate open access versionFindings
  • [27] S. Gunasekar, B. Woodworth, and N. Srebro. Mirrorless mirror descent: a more natural discretization of riemannian gradient flow. arXiv preprint arXiv:2004.01025, 2020.
    Findings
  • [28] P. Hand and V. Voroninski. Compressed sensing from phaseless Gaussian measurements via linear programming in the natural parameter spaces. arXiv preprint arXiv:1611.05985, 2016.
    Findings
  • [29] P. D. Hoff. Lasso, fractional norm and structured sparse estimation using a Hadamard product parametrization. Computational Statistics & Data Analysis, 115:186–198, 2017.
    Google ScholarLocate open access versionFindings
  • [30] K. Jaganathan, Y. C. Eldar, and B. Hassibi. Phase retrieval: An overview of recent developments. In A. Stern, editor, Optical Compressive Imaging, chapter 13, pages 263–296. Taylor Francis Group, Boca Raton, FL, 2016.
    Google ScholarLocate open access versionFindings
  • [31] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997.
    Google ScholarLocate open access versionFindings
  • [32] W. Kotłlowski and G. Neu. Bandit principal component analysis. arXiv preprint arXiv:1902.03035, 2019.
    Findings
  • [33] W. Krichene, M. Balandat, C. Tomlin, and A. Bayen. The hedge algorithm on a continuum. In International Conference on Machine Learning, pages 824–832, 2015.
    Google ScholarLocate open access versionFindings
  • [34] X. Li and V. Voroninski. Sparse signal recovery from quadratic measurements via convex programming. SIAM Journal on Mathematical Analysis, 45(5):3019–3033, 2013.
    Google ScholarLocate open access versionFindings
  • [35] Y. Li, T. Ma, and H. Zhang. Algorithmic regularization in over-parametrized matrix sensing and neural networks with quadratic activation. In Conference on Learning Theory, pages 2–47, 2018.
    Google ScholarLocate open access versionFindings
  • [36] C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In International Conference on Machine Learning, pages 3345–3354, 2018.
    Google ScholarLocate open access versionFindings
  • [37] O.-A. Maillard and R. Munos. Online learning in adversarial lipschitz environments. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 305–320, 2010.
    Google ScholarLocate open access versionFindings
  • [38] S. Mei, T. Misiakiewicz, and A. Montanari. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit. In Conference on Learning Theory, pages 1–77, 2019.
    Google ScholarLocate open access versionFindings
  • [39] A. Nemirovski and D. B. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, 1983.
    Google ScholarFindings
  • [40] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
    Google ScholarLocate open access versionFindings
  • [41] P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating minimization. IEEE Transactions on Signal Processing, 63(18):4814–4826, 2015.
    Google ScholarLocate open access versionFindings
  • [42] H. Ohlsson, A. Y. Yang, R. Dong, and S. S. Sastry. CPRL–an extension of compressive sensing to the phase retrieval problem. In Advances in Neural Information Processing Systems, pages 1367–1375, 2012.
    Google ScholarLocate open access versionFindings
  • [43] D. W. Peterson. A review of constraint qualifications in finite-dimensional spaces. SIAM Review, 15(3):639–654, 1973.
    Google ScholarLocate open access versionFindings
  • [44] M. Raginsky and J. Bouvrie. Continuous-time stochastic mirror descent on a network: variance reduction, consensus, convergence. In IEEE Conference on Decision and Control, pages 6793–6800, 2012.
    Google ScholarLocate open access versionFindings
  • [45] G. M. Rotskoff and E. Vanden-Eijnden. Trainability and accuracy of neural networks: an interacting particle system approach. arXiv preprint arXiv:1805.00915, 2018.
    Findings
  • [46] Y. Schechtman, A. Beck, and Y. C. Eldar. GESPAR: Efficient phase retrieval of sparse signals. IEEE Transactions on Signal Processing, 62(4):928–938, 2014.
    Google ScholarLocate open access versionFindings
  • [47] P. Schniter and S. Rangan. Compressive phase retrieval via generalized approximate message passing. IEEE Transactions on Signal Processing, 63(4):1043–1055, 2015.
    Google ScholarLocate open access versionFindings
  • [48] S. Shalev-Schwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4:107–194, 2015.
    Google ScholarLocate open access versionFindings
  • [49] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364, 2018.
    Google ScholarLocate open access versionFindings
  • [50] A. Suggala, A. Prasad, and P. K. Ravikumar. Connecting optimization and regularization paths. In Advances in Neural Information Processing Systems, pages 10608–10619, 2018.
    Google ScholarLocate open access versionFindings
  • [51] T. Vaškevičius, V. Kanade, and P. Rebeschini. Implicit regularization for optimal sparse recovery. In Advances in Neural Information Processing Systems, pages 2968–2979, 2019.
    Google ScholarLocate open access versionFindings
  • [52] T. Vaškevičius, V. Kanade, and P. Rebeschini. The statistical complexity of early stopper mirror descent. arXiv preprint arXiv:2002.00189, 2020.
    Findings
  • [53] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. Eldar and G. Kutyniok, editors, Compressed Sensing, Theory and Applications, chapter 5, pages 210–268. Cambridge University Press, Cambridge, 2012.
    Google ScholarLocate open access versionFindings
  • [54] A. Walther. The question of phase retrieval in optics. Optica Acta, 10(1):41–49, 1963.
    Google ScholarLocate open access versionFindings
  • [55] G. Wang, G. B. Giannakis, and Y. C. Eldar. Solving systems of random quadratic equations via truncated amplitude flow. IEEE Transactions on Information Theory, 64(2):773–794, 2017.
    Google ScholarLocate open access versionFindings
  • [56] G. Wang, L. Zhang, G. B. Giannakis, M. Akçakaya, and J. Chen. Sparse phase retrieval via truncated amplitude flow. IEEE Transactions on Signal Processing, 66(2):479–491, 2018.
    Google ScholarLocate open access versionFindings
  • [57] M. K. Warmuth and A. Jagota. Continuous and discrete time nonlinear gradient descent: relative loss bounds and convergence. In Electronic Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics, 1998.
    Google ScholarLocate open access versionFindings
  • [58] F. Wu and P. Rebeschini. Hadamard Wirtinger flow for sparse phase retrieval. arXiv preprint arXiv:2006.01065, 2020.
    Findings
  • [59] Z. Yuan, H. Wang, and Q. Wang. Phase retrieval via sparse Wirtinger flow. Journal of Computational and Applied Mathematics, 355:162–173, 2019.
    Google ScholarLocate open access versionFindings
  • [60] L. Zhang, G. Wang, G. B. Giannakis, and J. Chen. Compressive phase retrieval via reweighted amplitude flow. IEEE Transactions on Signal Processing, 66(19):5029–5040, 2018.
    Google ScholarLocate open access versionFindings
  • [61] S. Zhang and N. He. On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization. arXiv preprint arXiv:1806.04781, 2018.
    Findings
  • [62] P. Zhao, Y. Yang, and Q.-C. He. Implicit regularization via Hadamard product overparametrization in high-dimensional linear regression. arXiv preprint arXiv:1903.09367, 2019.
    Findings
  • [63] Z. Zhou, P. Mertikopoulos, N. Bambos, S. P. Boyd, and P. W. Glynn. Stochastic mirror descent in variationally coherent optimization problems. In Advances in Neural Information Processing Systems, pages 7040–7049, 2017.
    Google ScholarLocate open access versionFindings
  • [64] Z. Zhou, P. Mertikopoulos, N. Bambos, S. P. Boyd, and P. W. Glynn. On the convergence of mirror descent beyond stochastic convex programming. SIAM Journal on Optimization, 30(1):687–716, 2020.
    Google ScholarLocate open access versionFindings
  • 2. Allowing different values for α shows that, as remarked in Section 4, the dependence δ = Ω(n β)
    Google ScholarFindings
  • 1. As before, Lemma 3 gives
    Google ScholarFindings
  • 0. The analogous result holds for coordinates i
    Google ScholarFindings
  • 1. The analogous result holds for Xj(t) < 0, which shows that, for any j ∈/ S,
    Google ScholarFindings
  • 1. As this holds for every j ∈/ S, we have, for all t ≤ T1, 1+α
    Google ScholarFindings
  • 22. In particular, we have DΦ(x, X(T1)) ≤ 6 c k, where we used
    Google ScholarFindings
  • 5. With this, inequality
    Google ScholarFindings
  • 3. We decompose ∇F (w)i in a straightforward, albeit somewhat lengthy manner. We have
    Google ScholarFindings
  • 4. We begin by showing the bound γ x√S 1 k
    Google ScholarFindings
  • 0. In order to show that d dt
    Google ScholarLocate open access versionFindings
  • 11. Putting this together, we can bound the first sum in (61), m
    Google ScholarLocate open access versionFindings
Author
Fan Wu
Fan Wu
Patrick Rebeschini
Patrick Rebeschini
Your rating :
0

 

Tags
Comments
小科