Improper Learning for Non-Stochastic Control

Singh Karan
Singh Karan
Hazan Elad
Hazan Elad

COLT, pp. 3320-3436, 2020.

Cited by: 4|Views11
EI
Weibo:
We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies

Abstract:

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradien...More

Code:

Data:

Full Text
Bibtex
Weibo
Introduction
  • The machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems.
  • Recent work has turned to a more general paradigm termed the non-stochastic control problem: a model for dynamics that replaces stochastic noise with adversarial perturbations in the dynamics.
  • In this non-stochastic model, it is impossible to pre-compute an instance-wise optimal controller.
  • The authors' non-stochastic framework leads to new results for classical stochastic settings: e.g., the first tight regret bound for linear quadratic gaussian control (LQG) with an unknown system
Highlights
  • In recent years, the machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems
  • We present Gradient Feedback Control, or Gradient Feedback Control, a unified algorithm which achieves sublinear regret for online control of a partially observed LDS with both adversarial losses and noises, even when the true system is unknown to the learner
  • We assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick
  • We show that online convex optimization-with-memory obtains improved regret the losses are strongly convex and smooth, and when system is excited by persistent noise
  • This work presented a new adaptive controller we termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization. This method is suitable for controlling system with partial observation, where we show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems
  • We intend to compare our guarantees to techniques tailored to the stochastic setting, including Certainty Equivalence Control Mania et al [2019], Robust System Level System Dean et al [2018], and SDP-based relaxations Cohen et al [2019]
Methods
  • L0, K0 such that A − BK0 is stable, B − L0C.

    3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].

    4.
  • L0, K0 such that A − BK0 is stable, B − L0C.
  • 3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].
  • 4. Check that the estimated closed loop system and the true closed loop are close.
  • In the interest of brevity, the authors defer the details for the above to future workk.
  • Proof of Theorem 10
Results
  • Main Results for Non

    Stochastic Control

    For simplicity, the authors assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick.
  • The authors will assume the learner has foreknowledge of relevant decay parameters system norms.
  • Throughout, let dmin = min{dy, du} and dmax = max{dy, du}.
  • Let C > 0, ρ ∈ (0, 1) and δ ∈ (0, 1).
  • The authors further assume that the system decay ψG satisfies i≥n C Ai op ≤ ψG , and that ψG and the comparator ψ satisfies ψ(n), ψG (n) ≤ Cρn
Conclusion
  • This work presented a new adaptive controller the authors termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization
  • This method is suitable for controlling system with partial observation, where the authors show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems.
  • This technique attain√s optimal regret rates for many regimes of interest.
  • The authors hope to understand how to use these convex parametrizations for related problem formulations, such as robustness to system mispecification, safety contraints, and distributed control
Summary
  • Introduction:

    The machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems.
  • Recent work has turned to a more general paradigm termed the non-stochastic control problem: a model for dynamics that replaces stochastic noise with adversarial perturbations in the dynamics.
  • In this non-stochastic model, it is impossible to pre-compute an instance-wise optimal controller.
  • The authors' non-stochastic framework leads to new results for classical stochastic settings: e.g., the first tight regret bound for linear quadratic gaussian control (LQG) with an unknown system
  • Methods:

    L0, K0 such that A − BK0 is stable, B − L0C.

    3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].

    4.
  • L0, K0 such that A − BK0 is stable, B − L0C.
  • 3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].
  • 4. Check that the estimated closed loop system and the true closed loop are close.
  • In the interest of brevity, the authors defer the details for the above to future workk.
  • Proof of Theorem 10
  • Results:

    Main Results for Non

    Stochastic Control

    For simplicity, the authors assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick.
  • The authors will assume the learner has foreknowledge of relevant decay parameters system norms.
  • Throughout, let dmin = min{dy, du} and dmax = max{dy, du}.
  • Let C > 0, ρ ∈ (0, 1) and δ ∈ (0, 1).
  • The authors further assume that the system decay ψG satisfies i≥n C Ai op ≤ ψG , and that ψG and the comparator ψ satisfies ψ(n), ψG (n) ≤ Cρn
  • Conclusion:

    This work presented a new adaptive controller the authors termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization
  • This method is suitable for controlling system with partial observation, where the authors show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems.
  • This technique attain√s optimal regret rates for many regimes of interest.
  • The authors hope to understand how to use these convex parametrizations for related problem formulations, such as robustness to system mispecification, safety contraints, and distributed control
Tables
  • Table1: Summary of our results for online control
  • Table2: The above table describes regret rates for existing alorithms in various settings. The results are grouped by into known system and unknown system settings, and bold lines further divide the results into nonstochastic and stochastic/semi-adversarial regimes. In each of the four resulting regimes our results strictly improve upon prior art. Noise Types: Stochastic noise means well conditioned noise that is bounded or light-tailed, non-stochastic noise means noise selected by an arbitrary adversary, and semi-adversarial noise is an intermediate regime described formally by Assumption 6/ 6b. Comparator: Compared to past work in non-stochastic control, we compete with stabilizing LDCs, which strictly generalize state feedback control. We note however that for stochastic linear control with fixed quadratic costs, state feedback is optimal, up to additive constants that do not grow with horizon T
Download tables as Excel
Funding
  • Elad Hazan acknowledges funding from NSF grant 1704860
  • Max Simchowitz is generously supported by an Open Philanthropy graduate student fellowship
Reference
  • Yasin Abbasi-Yadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
    Google ScholarLocate open access versionFindings
  • Yasin Abbasi-Yadkori, Peter Bartlett, and Varun Kanade. Tracking adversarial targets. In International Conference on Machine Learning, pages 369–377, 2014.
    Google ScholarLocate open access versionFindings
  • Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119, 2019a.
    Google ScholarLocate open access versionFindings
  • Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In Advances in Neural Information Processing Systems 32, pages 10175–1018Curran Associates, Inc., 2019b.
    Google ScholarLocate open access versionFindings
  • Oren Anava, Elad Hazan, and Shie Mannor. Online learning for adversaries with memory: price of past mistakes. In Advances in Neural Information Processing Systems, pages 784–792, 2015.
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, and Yi Zhang. Towards provable control for unknown linear dynamical systems. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BygpQlbA-.rejected:invited to workshop track.
    Locate open access versionFindings
  • Tamer Basar and Pierre Bernhard. H-infinity optimal control and related minimax design problems: a dynamic game approach. Springer Science & Business Media, 2008.
    Google ScholarFindings
  • Dimitri Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 2005.
    Google ScholarFindings
  • Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
    Google ScholarFindings
  • Alon Cohen, Avinatan Hasidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. In International Conference on Machine Learning, pages 1029–1038, 2018.
    Google ScholarLocate open access versionFindings
  • Alon √Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with only O( T ) regret. In International Conference on Machine Learning, pages 1300–1309, 2019.
    Google ScholarLocate open access versionFindings
  • Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
    Google ScholarLocate open access versionFindings
  • Ofer Dekel and Elad Hazan. Better rates for any adversarial deterministic MDP. In International Conference on Machine Learning, pages 675–683, 2013.
    Google ScholarLocate open access versionFindings
  • Olivier Devolder, Francois Glineur, and Yurii Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37–75, 2014.
    Google ScholarLocate open access versionFindings
  • Eyal Even-Dar, Sham M Kakade, and Yishay Mansour. Online Markov decision processes. Mathematics of Operations Research, 34(3):726–736, 2009.
    Google ScholarLocate open access versionFindings
  • Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. Input perturbations for adaptive regulation and learning. arXiv preprint arXiv:1811.04258, 2018.
    Findings
  • Maryam Fazel, Rong Ge, Sham M Kakade, and Mehran Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. arXiv preprint arXiv:1801.05039, 2018.
    Findings
  • Paul J Goulart, Eric C Kerrigan, and Jan M Maciejowski. Optimization over state feedback policies for robust control with constraints. Automatica, 42(4):523–533, 2006.
    Google ScholarLocate open access versionFindings
  • Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2 (3-4):157–325, 2016. ISSN 2167-3888. doi: 10.1561/2400000013. URL http://dx.doi.org/10.1561/2400000013.
    Locate open access versionFindings
  • Elad Hazan, Karan Singh, and Cyril Zhang. Learning linear dynamical systems via spectral filtering. In Advances in Neural Information Processing Systems, pages 6702–6712, 2017.
    Google ScholarLocate open access versionFindings
  • Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, and Yi Zhang. Spectral filtering for general linear dynamical systems. In Advances in Neural Information Processing Systems, pages 4634–4643, 2018.
    Google ScholarLocate open access versionFindings
  • Elad Hazan, Sham M. Kakade, and Karan Singh. The nonstochastic control problem. arXiv preprint arXiv:1911.12178, 2019.
    Findings
  • Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends R in Optimization, 2(3-4):157–325, 2016.
    Google ScholarLocate open access versionFindings
  • Emilie Kaufmann, Olivier Cappe, and Aurelien Garivier. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
    Google ScholarLocate open access versionFindings
  • Vladimır Kucera. Stability of discrete linear feedback systems. IFAC Proceedings Volumes, 8(1):573–578, 1975.
    Google ScholarLocate open access versionFindings
  • Lennart Ljung. System identification. Wiley Encyclopedia of Electrical and Electronics Engineering, pages 1–19, 1999.
    Google ScholarLocate open access versionFindings
  • Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
    Findings
  • Ankur Moitra, William Perry, and Alexander S Wein. How robust are reconstruction thresholds for community detection? In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 828–841. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In 2019 American Control Conference (ACC), pages 5655–5661. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Ariadna Quattoni, Xavier Carreras, Michael Collins, and Trevor Darrell. An efficient projection for 1,∞ regularization. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 857–864. ACM, 2009.
    Google ScholarLocate open access versionFindings
  • Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite-time system identification for partially observed lti systems of unknown order. arXiv preprint arXiv:1902.01848, 2019.
    Findings
  • Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends R in Machine Learning, 4(2):107–194, 2012.
    Google ScholarLocate open access versionFindings
  • Max Simchowitz and Dylan J. Foster. Naive exploration is optimal for online lqr. arXiv preprint arXiv:2001.09576, 2020.
    Findings
  • Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473, 2018.
    Google ScholarLocate open access versionFindings
  • Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. In Conference on Learning Theory, pages 2714–2802, 2019.
    Google ScholarLocate open access versionFindings
  • Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
    Google ScholarLocate open access versionFindings
  • Robert F Stengel. Optimal control and estimation. Courier Corporation, 1994. Anastasios Tsiamis and George J Pappas. Finite sample analysis of stochastic system identification. arXiv preprint arXiv:1903.09122, 2019. Yuh-Shyang Wang, Nikolai Matni, and John C Doyle. A system level approach to controller synthesis. IEEE
    Findings
  • Transactions on Automatic Control, 2019. Dante Youla, Hamid Jabr, and Jr Bongiorno. Modern wiener-hopf design of optimal controllers–part ii: The multivariable case. IEEE Transactions on Automatic Control, 21(3):319–338, 1976. Alexander Zimin and Gergely Neu. Online learning in episodic markovian decision processes by relative entropy policy search. In Advances in neural information processing systems, pages 1583–1591, 2013.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments