CaltechAUTHORS
  A Caltech Library Service

Minimax Model Learning

Voloshin, Cameron and Jiang, Nan and Yue, Yisong (2021) Minimax Model Learning. Proceedings of Machine Learning Research, 130 . pp. 1612-1620. ISSN 1938-7228. https://resolver.caltech.edu/CaltechAUTHORS:20210510-100815979

[img] PDF - Published Version
See Usage Policy.

1MB
[img]
Preview
PDF - Submitted Version
Creative Commons Attribution.

1MB
[img] PDF - Supplemental Material
See Usage Policy.

441kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210510-100815979

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://proceedings.mlr.press/v130/voloshin21a.htmlPublisherArticle
http://arxiv.org/abs/2103.02084arXivDiscussion Paper
ORCID:
AuthorORCID
Yue, Yisong0000-0001-9127-1989
Additional Information:© 2021 by the author(s). Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Cameron Voloshin is supported in part by a Kortschak Fellowship. This work is also supported in part by NSF # 1645832, NSF # 1918839, and funding from Beyond Limits. Nan Jiang is sponsored in part by the DEVCOM Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 (ARL IoBT CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Funders:
Funding AgencyGrant Number
CaltechUNSPECIFIED
NSFCNS-1645832
NSFIIS-1918839
Beyond LimitsUNSPECIFIED
Army Research Office (ARO)W911NF-17-2-0196
Record Number:CaltechAUTHORS:20210510-100815979
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210510-100815979
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109031
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:10 May 2021 17:19
Last Modified:10 May 2021 17:19

Repository Staff Only: item control page