CaltechAUTHORS
  A Caltech Library Service

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Huang, Kevin and Lale, Sahin and Rosolia, Ugo and Shi, Yuanyuan and Anandkumar, Anima (2021) CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20220714-224628331

[img] PDF - Submitted Version
See Usage Policy.

794kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220714-224628331

Abstract

Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at https://github.com/KevinHuang8/CEM-GD.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.48550/arXiv.2112.07746arXivDiscussion Paper
https://github.com/KevinHuang8/CEM-GDRelated ItemCode
ORCID:
AuthorORCID
Huang, Kevin0000-0001-8195-8912
Lale, Sahin0000-0002-7191-346X
Rosolia, Ugo0000-0002-1682-0551
Shi, Yuanyuan0000-0002-6182-7664
Anandkumar, Anima0000-0002-6974-6797
Additional Information:© K. Huang, S. Lale, U. Rosolia, Y. Shi & A. Anandkumar.
Subject Keywords:Control and Planning, Cross-Entropy Method, Nonlinear Systems, Model-based RL
Record Number:CaltechAUTHORS:20220714-224628331
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220714-224628331
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115600
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:15 Jul 2022 23:19
Last Modified:23 Dec 2022 00:53

Repository Staff Only: item control page