CaltechAUTHORS
  A Caltech Library Service

Trust Region Policy Optimization for POMDPs

Azizzadenesheli, Kamyar and Bera, Manish Kumar and Anandkumar, Animashree (2018) Trust Region Policy Optimization for POMDPs. . (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:20190327-085807408

[img] PDF - Submitted Version
See Usage Policy.

6Mb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20190327-085807408

Abstract

We propose Generalized Trust Region Policy Optimization (GTRPO), a policy gradient Reinforcement Learning (RL) algorithm for both Markov decision processes (MDP) and Partially Observable Markov Decision Processes (POMDP). Policy gradient is a class of model-free RL methods. Previous policy gradient methods are guaranteed to converge only when the underlying model is an MDP and the policy is run for an infinite horizon. We relax these assumptions to episodic settings and to partially observable models with memory-less policies. For the latter class, GTRPO uses a variant of the Q-function with only three consecutive observations for each policy updates, and hence, is computationally efficient. We theoretically show that the policy updates in GTRPO monotonically improve the expected cumulative return and hence, GTRPO has convergence guarantees.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/1810.07900arXivDiscussion Paper
Additional Information:K. Azizzadenesheli is supported in part by NSF Career Award CCF-1254106 and Air Force FA9550-15-1-0221. A. Anandkumar is supported in part by Microsoft Faculty Fellowship, Google Faculty Research Award, Adobe Grant, NSF Career Award CCF-1254106, and AFOSR YIP FA9550-15-1-0221.
Funders:
Funding AgencyGrant Number
NSFCCF-1254106
Air Force Office of Scientific Research (AFOSR)FA9550-15-1-0221
Microsoft Faculty FellowshipUNSPECIFIED
Google Faculty Research AwardUNSPECIFIED
AdobeUNSPECIFIED
Record Number:CaltechAUTHORS:20190327-085807408
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20190327-085807408
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94179
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:28 Mar 2019 14:32
Last Modified:28 Mar 2019 14:32

Repository Staff Only: item control page