CaltechAUTHORS
  A Caltech Library Service

Competitive Policy Optimization

Prajapat, Manish and Azizzadenesheli, Kamyar and Liniger, Alexander and Yue, Yisong and Anandkumar, Anima (2020) Competitive Policy Optimization. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567

[img] PDF - Submitted Version
See Usage Policy.

3MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567

Abstract

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/2006.10611arXivDiscussion Paper
ORCID:
AuthorORCID
Prajapat, Manish0000-0002-3867-4575
Azizzadenesheli, Kamyar0000-0001-8507-1868
Liniger, Alexander0000-0002-7858-7900
Yue, Yisong0000-0001-9127-1989
Anandkumar, Anima0000-0002-6974-6797
Additional Information:The main body of this work took place when M. Prajapat was a visiting scholar at Caltech. The authors would like to thank Florian Schäfer for his support. M. Prajapat is thankful to Zeno Karl Schindler foundation for providing him with a Master thesis grant. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships.
Funders:
Funding AgencyGrant Number
Zeno Karl Schindler FoundationUNSPECIFIED
Raytheon CompanyUNSPECIFIED
Amazon Web ServicesUNSPECIFIED
Bren Professor of Computing and Mathematical SciencesUNSPECIFIED
Defense Advanced Research Projects Agency (DARPA)HR00111890035
Learning with Less Labels (LwLL)UNSPECIFIED
Microsoft Faculty FellowshipUNSPECIFIED
Google Faculty Research AwardUNSPECIFIED
AdobeUNSPECIFIED
DOI:10.48550/arXiv.2006.10611
Record Number:CaltechAUTHORS:20201106-120215567
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20201106-120215567
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:106490
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:06 Nov 2020 22:45
Last Modified:02 Jun 2023 01:08

Repository Staff Only: item control page