A Caltech Library Service

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Mahajan, Anuj and Samvelyan, Mikayel and Mao, Lei and Makoviychuk, Viktor and Garg, Animesh and Kossaifi, Jean and Whiteson, Shimon and Zhu, Yuke and Anandkumar, Animashree (2021) Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning. Proceedings of Machine Learning Research, 139 . pp. 7301-7312. ISSN 2640-3498.

[img] PDF - Published Version
See Usage Policy.

[img] PDF - Accepted Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


Reinforcement Learning in large action spaces is a challenging problem. This is especially true for cooperative multi-agent reinforcement learning (MARL), which often requires tractable learning while respecting various constraints like communication budget and information about other agents. In this work, we focus on the fundamental hurdle affecting both value-based and policy-gradient approaches: an exponential blowup of the action space with the number of agents. For value-based methods, it poses challenges in accurately representing the optimal value function for value-based methods, thus inducing suboptimality. For policy gradient methods, it renders the critic ineffective and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function with a low-complexity hypothesis class. This requires accurately modelling the agent interactions in a sample efficient way. To this end, we propose a novel tensorised formulation of the Bellman equation. This gives rise to our method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents. Algorithms derived from Tesseract decompose the Q-tensor across the agents and utilise low-rank tensor approximations to model the agent interactions relevant to the task. We provide PAC analysis for Tesseract based algorithms and highlight their relevance to the class of rich observation MDPs. Empirical results in different domains confirm the gains in sample efficiency using Tesseract as supported by the theory.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper
Garg, Animesh0000-0003-0482-4296
Kossaifi, Jean0000-0002-4445-3429
Zhu, Yuke0000-0002-9198-2227
Additional Information:© 2021 The authors. AM is funded by the J.P. Morgan A.I. fellowship. Part of this work was done during AM’s internship at NVIDIA. This project has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 637713). The experiments were made possible by generous equipment grant from NVIDIA.
Funding AgencyGrant Number
J.P. Morgan A.I. fellowshipUNSPECIFIED
European Research Council (ERC)637713
Record Number:CaltechAUTHORS:20210831-203904421
Persistent URL:
Official Citation:Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7301-7312, 2021.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:110647
Deposited By: George Porter
Deposited On:01 Sep 2021 14:47
Last Modified:01 Sep 2021 14:47

Repository Staff Only: item control page