A Caltech Library Service

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

Wang, Chen and Fan, Linxi and Sun, Jiankai and Zhang, Ruohan and Fei-Fei, Li and Xu, Danfei and Zhu, Yuke and Anandkumar, Anima (2023) MimicPlay: Long-Horizon Imitation Learning by Watching Human Play. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Imitation Learning from human demonstrations is a promising paradigm to teach robots manipulation skills in the real world, but learning complex long-horizon tasks often requires an unattainable amount of demonstrations. To reduce the high data requirement, we resort to human play data - video sequences of people freely interacting with the environment using their hands. We hypothesize that even with different morphologies, human play data contain rich and salient information about physical interactions that can readily facilitate robot policy learning. Motivated by this, we introduce a hierarchical learning framework named MimicPlay that learns latent plans from human play data to guide low-level visuomotor control trained on a small number of teleoperated demonstrations. With systematic evaluations of 14 long-horizon manipulation tasks in the real world, we show that MimicPlay dramatically outperforms state-of-the-art imitation learning methods in task success rate, generalization ability, and robustness to disturbances. More details and video results could be found at

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemProject website
Fan, Linxi0000-0001-7393-3125
Sun, Jiankai0000-0001-5633-1739
Fei-Fei, Li0000-0002-7481-0810
Xu, Danfei0000-0002-8744-3861
Zhu, Yuke0000-0002-9198-2227
Anandkumar, Anima0000-0002-6974-6797
Additional Information:We are extremely grateful to Yifeng Zhu, Ajay Mandlekar for their efforts in developing the robot control library - Deoxys1 and RoboTurk[30]. Stanford provides the necessary computing resource and infrastructure for this project. L. FF is partially supported by the Stanford HAI Hoffman-Yee Research Grant. This work is done during Chen Wang’s internship at NVIDIA.
Funding AgencyGrant Number
Stanford UniversityUNSPECIFIED
Record Number:CaltechAUTHORS:20230316-153701883
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120080
Deposited By: George Porter
Deposited On:16 Mar 2023 22:11
Last Modified:16 Mar 2023 22:11

Repository Staff Only: item control page