Zhang, Amy and Lipton, Zachary C. and Pineda, Luis and Azizzadenesheli, Kamyar and Anandkumar, Anima and Itti, Laurent and Pineau, Joelle and Furlanello, Tommaso (2019) Learning Causal State Representations of Partially Observable Environments. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448
![]() |
PDF
- Submitted Version
See Usage Policy. 4MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448
Abstract
Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs.
Item Type: | Report or Paper (Discussion Paper) | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||||||||||||||
ORCID: |
| ||||||||||||||||||||||||
Additional Information: | Part of this work was supported by the National Science Foundation (grant number CCF-1317433), C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA), and the Intel Corporation. A. Anandkumar is supported in part by Bren endowed chair, Darpa PAI, Raytheon, and Microsoft, Google and Adobe faculty fellowships. K. Azizzadenesheli is supported in part by NSF Career Award CCF-1254106 and AFOSR YIP FA9550-15-1-0221, work done while he was visiting Caltech. The authors affirm that the views expressed herein are solely their own, and do not represent the views of the United States government or any agency thereof. | ||||||||||||||||||||||||
Funders: |
| ||||||||||||||||||||||||
Record Number: | CaltechAUTHORS:20190905-154244448 | ||||||||||||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20190905-154244448 | ||||||||||||||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||||||||||||
ID Code: | 98452 | ||||||||||||||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||||||||||||||
Deposited By: | George Porter | ||||||||||||||||||||||||
Deposited On: | 06 Sep 2019 14:52 | ||||||||||||||||||||||||
Last Modified: | 23 Dec 2022 19:02 |
Repository Staff Only: item control page