A Caltech Library Service

Pre-Trained Language Models for Interactive Decision-Making

Li, Shuang and Puig, Xavier and Paxton, Chris and Du, Yilun and Wang, Clinton and Fan, Linxi and Cheng, Tao and Huang, De-An and Akyürek, Ekin and Anandkumar, Anima and Andreas, Jacob and Mordatch, Igor and Torralba, Antonio and Zhu, Yuke (2022) Pre-Trained Language Models for Interactive Decision-Making. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and generalization in general sequential decision-making problems. In this approach, goals and observations are represented as a sequence of embeddings, and a policy network initialized with a pre-trained LM predicts the next action. We demonstrate that this framework enables effective combinatorial generalization across different environments and supervisory modalities. We begin by assuming access to a set of expert demonstrations, and show that initializing policies with LMs and fine-tuning them via behavior cloning improves task completion rates by 43.6% in the VirtualHome environment. We then examine how our framework may be used in environments without pre-collected expert data. To do this, we integrate an active data gathering procedure into pre-trained LMs. The agent iteratively learns by interacting with the environment, relabeling the language goal of past 'failed' experiences, and updating the policy in a self-supervised loop. The active data gathering procedure also enables effective combinatorial generalization, outperforming the best baseline by 25.1%. Finally, we explain these results by investigating three possible factors underlying the effectiveness of the LM-based policy. We find that sequential input representations (vs. fixed-dimensional feature vectors) and favorable weight initialization are both important for generalization. Surprisingly, however, the format of the policy inputs encoding (e.g. as a natural language string vs. an arbitrary sequential encoding) has little influence. Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemProject website
Cheng, Tao0000-0003-4830-177X
Anandkumar, Anima0000-0002-6974-6797
Zhu, Yuke0000-0002-9198-2227
Additional Information:Part of this work was done during Shuang’s internship at NVIDIA.
Record Number:CaltechAUTHORS:20220714-224621396
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115598
Deposited By: George Porter
Deposited On:15 Jul 2022 23:22
Last Modified:15 Jul 2022 23:22

Repository Staff Only: item control page