A Caltech Library Service

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

Lee, Youngwoon and Lim, Joseph J. and Anandkumar, Anima and Zhu, Yuke (2021) Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization. Proceedings of Machine Learning Research, 164 . pp. 406-416. ISSN 2640-3498.

[img] PDF - Published Version
See Usage Policy.

[img] PDF - Accepted Version
Creative Commons Attribution.

[img] Archive (ZIP) - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy’s starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at

Item Type:Article
Related URLs:
URLURL TypeDescription Paper ItemPaper website Information
Anandkumar, Anima0000-0002-6974-6797
Zhu, Yuke0000-0002-9198-2227
Additional Information:This work was initiated when Youngwoon Lee worked at NVIDIA Research as an intern. This research is also supported by the Annenberg Fellowship from USC and the Google Cloud Research Credits program with the award GCP19980904. We would like to thank Byron Boots for initial discussion, Jim Fan, De-An Huang, Christopher B. Choy, and NVIDIA AI Algorithms team for their insightful feedback, and the USC CLVR lab members for constructive feedback.
Funding AgencyGrant Number
University of Southern CaliforniaUNSPECIFIED
Google CloudGCP19980904
Subject Keywords:Long-Horizon Manipulation, Skill Chaining, Reinforcement Learning
Record Number:CaltechAUTHORS:20220714-224643553
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115604
Deposited By: George Porter
Deposited On:15 Jul 2022 23:20
Last Modified:15 Jul 2022 23:20

Repository Staff Only: item control page