Prajapat, Manish and Azizzadenesheli, Kamyar and Liniger, Alexander and Yue, Yisong and Anandkumar, Anima (2020) Competitive Policy Optimization. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567
![]() |
PDF
- Submitted Version
See Usage Policy. 3MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567
Abstract
A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.
Item Type: | Report or Paper (Discussion Paper) | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||||||||||
ORCID: |
| ||||||||||||||||||||
Additional Information: | The main body of this work took place when M. Prajapat was a visiting scholar at Caltech. The authors would like to thank Florian Schäfer for his support. M. Prajapat is thankful to Zeno Karl Schindler foundation for providing him with a Master thesis grant. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships. | ||||||||||||||||||||
Funders: |
| ||||||||||||||||||||
DOI: | 10.48550/arXiv.2006.10611 | ||||||||||||||||||||
Record Number: | CaltechAUTHORS:20201106-120215567 | ||||||||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567 | ||||||||||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||||||||
ID Code: | 106490 | ||||||||||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||||||||||
Deposited By: | George Porter | ||||||||||||||||||||
Deposited On: | 06 Nov 2020 22:45 | ||||||||||||||||||||
Last Modified: | 02 Jun 2023 01:08 |
Repository Staff Only: item control page