CaltechAUTHORS
  A Caltech Library Service

Learning Causal State Representations of Partially Observable Environments

Zhang, Amy and Lipton, Zachary C. and Pineda, Luis and Azizzadenesheli, Kamyar and Anandkumar, Anima and Itti, Laurent and Pineau, Joelle and Furlanello, Tommaso (2019) Learning Causal State Representations of Partially Observable Environments. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448

[img] PDF - Submitted Version
See Usage Policy.

4MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448

Abstract

Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/1906.10437arXivDiscussion Paper
ORCID:
AuthorORCID
Azizzadenesheli, Kamyar0000-0001-8507-1868
Additional Information:Part of this work was supported by the National Science Foundation (grant number CCF-1317433), C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA), and the Intel Corporation. A. Anandkumar is supported in part by Bren endowed chair, Darpa PAI, Raytheon, and Microsoft, Google and Adobe faculty fellowships. K. Azizzadenesheli is supported in part by NSF Career Award CCF-1254106 and AFOSR YIP FA9550-15-1-0221, work done while he was visiting Caltech. The authors affirm that the views expressed herein are solely their own, and do not represent the views of the United States government or any agency thereof.
Funders:
Funding AgencyGrant Number
NSFCCF-1317433
Semiconductor Research CorporationUNSPECIFIED
Defense Advanced Research Projects Agency (DARPA)UNSPECIFIED
IntelUNSPECIFIED
Bren Professor of Computing and Mathematical SciencesUNSPECIFIED
Raytheon CompanyUNSPECIFIED
Microsoft Faculty FellowshipUNSPECIFIED
Google Faculty Research AwardUNSPECIFIED
AdobeUNSPECIFIED
NSFCCF-1254106
Air Force Office of Scientific Research (AFOSR)FA9550-15-1-0221
Record Number:CaltechAUTHORS:20190905-154244448
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:98452
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:06 Sep 2019 14:52
Last Modified:11 Nov 2020 00:56

Repository Staff Only: item control page