A Caltech Library Service

Learning Causal State Representations of Partially Observable Environments

Zhang, Amy and Lipton, Zachary C. and Pineda, Luis and Azizzadenesheli, Kamyar and Anandkumar, Anima and Itti, Laurent and Pineau, Joelle and Furlanello, Tommaso (2019) Learning Causal State Representations of Partially Observable Environments. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Zhang, Amy0000-0002-4061-5582
Lipton, Zachary C.0000-0002-3824-4241
Azizzadenesheli, Kamyar0000-0001-8507-1868
Anandkumar, Anima0000-0002-6974-6797
Itti, Laurent0000-0002-0168-2977
Pineau, Joelle0000-0003-0747-7250
Furlanello, Tommaso0000-0003-1935-5146
Additional Information:Part of this work was supported by the National Science Foundation (grant number CCF-1317433), C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA), and the Intel Corporation. A. Anandkumar is supported in part by Bren endowed chair, Darpa PAI, Raytheon, and Microsoft, Google and Adobe faculty fellowships. K. Azizzadenesheli is supported in part by NSF Career Award CCF-1254106 and AFOSR YIP FA9550-15-1-0221, work done while he was visiting Caltech. The authors affirm that the views expressed herein are solely their own, and do not represent the views of the United States government or any agency thereof.
Funding AgencyGrant Number
Semiconductor Research CorporationUNSPECIFIED
Defense Advanced Research Projects Agency (DARPA)UNSPECIFIED
Bren Professor of Computing and Mathematical SciencesUNSPECIFIED
Raytheon CompanyUNSPECIFIED
Microsoft Faculty FellowshipUNSPECIFIED
Google Faculty Research AwardUNSPECIFIED
Air Force Office of Scientific Research (AFOSR)FA9550-15-1-0221
Record Number:CaltechAUTHORS:20190905-154244448
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:98452
Deposited By: George Porter
Deposited On:06 Sep 2019 14:52
Last Modified:23 Dec 2022 19:02

Repository Staff Only: item control page