CaltechAUTHORS
  A Caltech Library Service

Iterative Amortized Policy Optimization

Marino, Joseph and Piché, Alexandre and Ialongo, Alessandro Davide and Yue, Yisong (2020) Iterative Amortized Policy Optimization. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20201110-082336091

[img] PDF - Submitted Version
See Usage Policy.

4MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20201110-082336091

Abstract

Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control, enabling the estimation and sampling of high-value actions. From the variational inference perspective on RL, policy networks, when employed with entropy or KL regularization, are a form of amortized optimization, optimizing network parameters rather than the policy distributions directly. However, this direct amortized mapping can empirically yield suboptimal policy estimates. Given this perspective, we consider the more flexible class of iterative amortized optimizers. We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over conventional direct amortization methods on benchmark continuous control tasks.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/2010.10670arXivDiscussion Paper
https://github.com/joelouismarino/variational_rlRelated ItemCode
ORCID:
AuthorORCID
Marino, Joseph0000-0001-6387-8062
Yue, Yisong0000-0001-9127-1989
Additional Information:JM thanks Scott Fujimoto for helpful discussions. This project was funded in part by Beyond Limits and Raytheon. Author Contributions: JM conceived the project, implemented the code, performed the experiments, and wrote the paper. AP advised on reinforcement learning, provided the initial value estimation implementation, and reviewed the paper. ADI provided guidance on model learning, helped with model implementation, and reviewed the paper. YY advised on the overall project and reviewed the paper.
Funders:
Funding AgencyGrant Number
Beyond LimitsUNSPECIFIED
Raytheon CompanyUNSPECIFIED
Record Number:CaltechAUTHORS:20201110-082336091
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20201110-082336091
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:106584
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:10 Nov 2020 16:28
Last Modified:10 Nov 2020 16:28

Repository Staff Only: item control page