Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis
- Creators
- Li, Kun
- Burdick, Joel W.
Abstract
This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved.
Attached Files
Published - metalearn17_li.pdf
Files
Name | Size | Download all |
---|---|---|
md5:a026d4158b14b273f93541052b9fdfa1
|
135.2 kB | Preview Download |
Additional details
- Eprint ID
- 94631
- Resolver ID
- CaltechAUTHORS:20190410-120626845
- Created
-
2019-04-11Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field