# Improper Learning for Non-Stochastic Control

COLT, pp. 3320-3436, 2020.

EI

Weibo:

Abstract:

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradien...More

Code:

Data:

Introduction

- The machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems.
- Recent work has turned to a more general paradigm termed the non-stochastic control problem: a model for dynamics that replaces stochastic noise with adversarial perturbations in the dynamics.
- In this non-stochastic model, it is impossible to pre-compute an instance-wise optimal controller.
- The authors' non-stochastic framework leads to new results for classical stochastic settings: e.g., the first tight regret bound for linear quadratic gaussian control (LQG) with an unknown system

Highlights

- In recent years, the machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems
- We present Gradient Feedback Control, or Gradient Feedback Control, a unified algorithm which achieves sublinear regret for online control of a partially observed LDS with both adversarial losses and noises, even when the true system is unknown to the learner
- We assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick
- We show that online convex optimization-with-memory obtains improved regret the losses are strongly convex and smooth, and when system is excited by persistent noise
- This work presented a new adaptive controller we termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization. This method is suitable for controlling system with partial observation, where we show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems
- We intend to compare our guarantees to techniques tailored to the stochastic setting, including Certainty Equivalence Control Mania et al [2019], Robust System Level System Dean et al [2018], and SDP-based relaxations Cohen et al [2019]

Methods

- L0, K0 such that A − BK0 is stable, B − L0C.

3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].

4. - L0, K0 such that A − BK0 is stable, B − L0C.
- 3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].
- 4. Check that the estimated closed loop system and the true closed loop are close.
- In the interest of brevity, the authors defer the details for the above to future workk.
- Proof of Theorem 10

Results

**Main Results for Non**

Stochastic Control

For simplicity, the authors assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick.- The authors will assume the learner has foreknowledge of relevant decay parameters system norms.
- Throughout, let dmin = min{dy, du} and dmax = max{dy, du}.
- Let C > 0, ρ ∈ (0, 1) and δ ∈ (0, 1).
- The authors further assume that the system decay ψG satisfies i≥n C Ai op ≤ ψG , and that ψG and the comparator ψ satisfies ψ(n), ψG (n) ≤ Cρn

Conclusion

- This work presented a new adaptive controller the authors termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization
- This method is suitable for controlling system with partial observation, where the authors show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems.
- This technique attain√s optimal regret rates for many regimes of interest.
- The authors hope to understand how to use these convex parametrizations for related problem formulations, such as robustness to system mispecification, safety contraints, and distributed control

Summary

## Introduction:

The machine learning community has produced a great body of work applying modern statistical and algorithmic techniques to classical control problems.- Recent work has turned to a more general paradigm termed the non-stochastic control problem: a model for dynamics that replaces stochastic noise with adversarial perturbations in the dynamics.
- In this non-stochastic model, it is impossible to pre-compute an instance-wise optimal controller.
- The authors' non-stochastic framework leads to new results for classical stochastic settings: e.g., the first tight regret bound for linear quadratic gaussian control (LQG) with an unknown system
## Methods:

L0, K0 such that A − BK0 is stable, B − L0C.

3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].

4.- L0, K0 such that A − BK0 is stable, B − L0C.
- 3. Perturb L ← (1 + 1u1)L and K ← (1 + 1u2)K, where u1, u2 ∼ uniform[0, 1].
- 4. Check that the estimated closed loop system and the true closed loop are close.
- In the interest of brevity, the authors defer the details for the above to future workk.
- Proof of Theorem 10
## Results:

**Main Results for Non**

Stochastic Control

For simplicity, the authors assume a finite horizon T ; extensions to infinite horizon can be obtained by a doubling trick.- The authors will assume the learner has foreknowledge of relevant decay parameters system norms.
- Throughout, let dmin = min{dy, du} and dmax = max{dy, du}.
- Let C > 0, ρ ∈ (0, 1) and δ ∈ (0, 1).
- The authors further assume that the system decay ψG satisfies i≥n C Ai op ≤ ψG , and that ψG and the comparator ψ satisfies ψ(n), ψG (n) ≤ Cρn
## Conclusion:

This work presented a new adaptive controller the authors termed Gradient Feedback Control (GFC), inspired by a Youla’s parametrization- This method is suitable for controlling system with partial observation, where the authors show an efficient algorithm that attains the first sublinear regret bounds under adversarial noise for both known and unknown systems.
- This technique attain√s optimal regret rates for many regimes of interest.
- The authors hope to understand how to use these convex parametrizations for related problem formulations, such as robustness to system mispecification, safety contraints, and distributed control

- Table1: Summary of our results for online control
- Table2: The above table describes regret rates for existing alorithms in various settings. The results are grouped by into known system and unknown system settings, and bold lines further divide the results into nonstochastic and stochastic/semi-adversarial regimes. In each of the four resulting regimes our results strictly improve upon prior art. Noise Types: Stochastic noise means well conditioned noise that is bounded or light-tailed, non-stochastic noise means noise selected by an arbitrary adversary, and semi-adversarial noise is an intermediate regime described formally by Assumption 6/ 6b. Comparator: Compared to past work in non-stochastic control, we compete with stabilizing LDCs, which strictly generalize state feedback control. We note however that for stochastic linear control with fixed quadratic costs, state feedback is optimal, up to additive constants that do not grow with horizon T

Funding

- Elad Hazan acknowledges funding from NSF grant 1704860
- Max Simchowitz is generously supported by an Open Philanthropy graduate student fellowship

Reference

- Yasin Abbasi-Yadkori and Csaba Szepesvari. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
- Yasin Abbasi-Yadkori, Peter Bartlett, and Varun Kanade. Tracking adversarial targets. In International Conference on Machine Learning, pages 369–377, 2014.
- Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, and Karan Singh. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119, 2019a.
- Naman Agarwal, Elad Hazan, and Karan Singh. Logarithmic regret for online control. In Advances in Neural Information Processing Systems 32, pages 10175–1018Curran Associates, Inc., 2019b.
- Oren Anava, Elad Hazan, and Shie Mannor. Online learning for adversaries with memory: price of past mistakes. In Advances in Neural Information Processing Systems, pages 784–792, 2015.
- Sanjeev Arora, Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, and Yi Zhang. Towards provable control for unknown linear dynamical systems. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BygpQlbA-.rejected:invited to workshop track.
- Tamer Basar and Pierre Bernhard. H-infinity optimal control and related minimax design problems: a dynamic game approach. Springer Science & Business Media, 2008.
- Dimitri Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 2005.
- Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603. ACM, 2014.
- Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
- Alon Cohen, Avinatan Hasidim, Tomer Koren, Nevena Lazic, Yishay Mansour, and Kunal Talwar. Online linear quadratic control. In International Conference on Machine Learning, pages 1029–1038, 2018.
- Alon √Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with only O( T ) regret. In International Conference on Machine Learning, pages 1300–1309, 2019.
- Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
- Ofer Dekel and Elad Hazan. Better rates for any adversarial deterministic MDP. In International Conference on Machine Learning, pages 675–683, 2013.
- Olivier Devolder, Francois Glineur, and Yurii Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37–75, 2014.
- Eyal Even-Dar, Sham M Kakade, and Yishay Mansour. Online Markov decision processes. Mathematics of Operations Research, 34(3):726–736, 2009.
- Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, and George Michailidis. Input perturbations for adaptive regulation and learning. arXiv preprint arXiv:1811.04258, 2018.
- Maryam Fazel, Rong Ge, Sham M Kakade, and Mehran Mesbahi. Global convergence of policy gradient methods for the linear quadratic regulator. arXiv preprint arXiv:1801.05039, 2018.
- Paul J Goulart, Eric C Kerrigan, and Jan M Maciejowski. Optimization over state feedback policies for robust control with constraints. Automatica, 42(4):523–533, 2006.
- Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2 (3-4):157–325, 2016. ISSN 2167-3888. doi: 10.1561/2400000013. URL http://dx.doi.org/10.1561/2400000013.
- Elad Hazan, Karan Singh, and Cyril Zhang. Learning linear dynamical systems via spectral filtering. In Advances in Neural Information Processing Systems, pages 6702–6712, 2017.
- Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, and Yi Zhang. Spectral filtering for general linear dynamical systems. In Advances in Neural Information Processing Systems, pages 4634–4643, 2018.
- Elad Hazan, Sham M. Kakade, and Karan Singh. The nonstochastic control problem. arXiv preprint arXiv:1911.12178, 2019.
- Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends R in Optimization, 2(3-4):157–325, 2016.
- Emilie Kaufmann, Olivier Cappe, and Aurelien Garivier. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
- Vladimır Kucera. Stability of discrete linear feedback systems. IFAC Proceedings Volumes, 8(1):573–578, 1975.
- Lennart Ljung. System identification. Wiley Encyclopedia of Electrical and Electronics Engineering, pages 1–19, 1999.
- Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
- Ankur Moitra, William Perry, and Alexander S Wein. How robust are reconstruction thresholds for community detection? In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 828–841. ACM, 2016.
- Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In 2019 American Control Conference (ACC), pages 5655–5661. IEEE, 2019.
- Ariadna Quattoni, Xavier Carreras, Michael Collins, and Trevor Darrell. An efficient projection for 1,∞ regularization. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 857–864. ACM, 2009.
- Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite-time system identification for partially observed lti systems of unknown order. arXiv preprint arXiv:1902.01848, 2019.
- Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends R in Machine Learning, 4(2):107–194, 2012.
- Max Simchowitz and Dylan J. Foster. Naive exploration is optimal for online lqr. arXiv preprint arXiv:2001.09576, 2020.
- Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473, 2018.
- Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. In Conference on Learning Theory, pages 2714–2802, 2019.
- Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
- Robert F Stengel. Optimal control and estimation. Courier Corporation, 1994. Anastasios Tsiamis and George J Pappas. Finite sample analysis of stochastic system identification. arXiv preprint arXiv:1903.09122, 2019. Yuh-Shyang Wang, Nikolai Matni, and John C Doyle. A system level approach to controller synthesis. IEEE
- Transactions on Automatic Control, 2019. Dante Youla, Hamid Jabr, and Jr Bongiorno. Modern wiener-hopf design of optimal controllers–part ii: The multivariable case. IEEE Transactions on Automatic Control, 21(3):319–338, 1976. Alexander Zimin and Gergely Neu. Online learning in episodic markovian decision processes by relative entropy policy search. In Advances in neural information processing systems, pages 1583–1591, 2013.

Tags

Comments