CaltechAUTHORS
  A Caltech Library Service

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Zhang, Yizhou and Qu, Guannan and Xu, Pan and Lin, Yiheng and Chen, Zaiwei and Wierman, Adam (2023) Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7 (1). Art. No. 13. ISSN 2476-1249. doi:10.1145/3579443. https://resolver.caltech.edu/CaltechAUTHORS:20230316-87864000.2

[img] PDF - Published Version
Creative Commons Attribution.

979kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230316-87864000.2

Abstract

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1145/3579443DOIArticle
https://resolver.caltech.edu/CaltechAUTHORS:20230316-204011712Related ItemDiscussion Paper
ORCID:
AuthorORCID
Zhang, Yizhou0000-0002-5677-4748
Qu, Guannan0000-0002-5466-3550
Xu, Pan0000-0002-2559-8622
Lin, Yiheng0000-0001-6524-2877
Chen, Zaiwei0000-0001-9915-5595
Wierman, Adam0000-0002-5923-0199
Additional Information:© 2023 held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. Yizhou Zhang, Guannan Qu, Pan Xu contributed equally to this work. Guannan Qu is supported by NSF Grant EPCN-2154171 and C3 AI Institute. Pan Xu is supported by the startup funding at the Department of Biostatistics and Bioinformatics at Duke University. Yiheng Lin is supported by PIMCO Graduate Fellowship. Zaiwei Chen is supported by PIMCO Postdoc Fellowship and Simoudis Discovery Prize. Adam Wierman is supported by NSF Grants CNS-2146814, CPS-2136197, CNS-2106403, NGSDI-2105648, with additional support from Amazon AWS.
Funders:
Funding AgencyGrant Number
NSFCNS-2154171
C3 AI InstituteUNSPECIFIED
Duke UniversityUNSPECIFIED
PIMCOUNSPECIFIED
Simoudis Discovery PrizeUNSPECIFIED
NSFCNS-2146814
NSFECCS-2136197
NSFCNS-2106403
NSFCNS-2105648
Amazon Web ServicesUNSPECIFIED
Issue or Number:1
DOI:10.1145/3579443
Record Number:CaltechAUTHORS:20230316-87864000.2
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20230316-87864000.2
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120112
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:17 Mar 2023 21:32
Last Modified:17 Mar 2023 21:32

Repository Staff Only: item control page