Co-training for Policy Learning

Creators: Song, Jialin; Lanka, Ravi; Yue, Yisong; Ono, Masahiro

Style

An error occurred while generating the citation.

Abstract

We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

Additional Information

The work was funded in part by NSF awards #1637598 & #1645832, and support from Raytheon and Northrop Grumman. This research was also conducted in part at the Jet Propulsion Lab, California Insitute of Technology under a contract with the National Aeronautics and Space Administration.

Attached Files

Submitted - 1907.04484.pdf

Files

1907.04484.pdf

Files (784.4 kB)

Name	Size	Download all
1907.04484.pdf md5:68c041b5a6eb21eb8dc8ec798b489ec2	784.4 kB	Preview Download

Additional details

	All versions	This version
Views	22	22
Downloads	14	14
Data volume	11.8 MB	11.8 MB