Kargin, Taylan and Lale, Sahin and Azizzadenesheli, Kamyar and Anandkumar, Anima and Hassibi, Babak (2022) Thompson Sampling Achieves Õ(√T) Regret in Linear Quadratic Control. Proceedings of Machine Learning Research, 178 . pp. 3235-3284. ISSN 2640-3498. https://resolver.caltech.edu/CaltechAUTHORS:20220714-212445251
![]() |
PDF
- Published Version
See Usage Policy. 617kB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220714-212445251
Abstract
Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that Õ(√T) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains Õ(√T)regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.
Item Type: | Article | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||
ORCID: |
| ||||||||||||
Additional Information: | © 2022 T. Kargin, S. Lale, K. Azizzadenesheli, A. Anandkumar & B. Hassibi. | ||||||||||||
Subject Keywords: | Thompson sampling, adaptive control, linear quadratic control, regret | ||||||||||||
Record Number: | CaltechAUTHORS:20220714-212445251 | ||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20220714-212445251 | ||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||
ID Code: | 115576 | ||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||
Deposited By: | George Porter | ||||||||||||
Deposited On: | 14 Jul 2022 23:18 | ||||||||||||
Last Modified: | 23 Dec 2022 00:52 |
Repository Staff Only: item control page