Published June 2023 | Version v1
Conference Paper Open

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

  • 1. ROR icon California Institute of Technology

Abstract

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI. This extended abstract is an abridged version of [12].

Copyright and License

© 2023 Copyright held by the owner/author(s).

Contributions

Yizhou Zhang, Guannan Qu, Pan Xu contributed equally to this work.

Files

3578338.3593545.pdf

Files (875.9 kB)

Name Size Download all
md5:f45d4171d61156eff48fdf6c24b13a02
875.9 kB Preview Download

Additional details

Related works

Funding

Amazon AWS
PIMCO Postdoc Fellowship
National Science Foundation
2154171, 2146814, 2136197, 2106403, 2105648
C3 AI Institute
Simoudis Discovery Prize
PIMCO Graduate Fellowship

Dates

Accepted
2023-06-19
published print
Accepted
2023-06-19
published online