Li, Kun and Sui, Yanan and Burdick, Joel W. (2017) Bellman Gradient Iteration for Inverse Reinforcement Learning. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20190410-120640737
![]() |
PDF
- Submitted Version
See Usage Policy. 432kB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190410-120640737
Abstract
This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods allow us to build a differentiable relation between the Q-value and the reward function and learn an approximately optimal reward function with gradient methods. We test the proposed method in two simulated environments by evaluating the accuracy of different approximations and comparing the proposed method with existing solutions. The results show that even with a linear reward function, the proposed method has a comparable accuracy with the state-of-the-art method adopting a non-linear reward function, and the proposed method is more flexible because it is defined on observed actions instead of trajectories.
Item Type: | Report or Paper (Discussion Paper) | ||||||
---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||
ORCID: |
| ||||||
Additional Information: | This work was supported by the National Institutes of Health, NIBIB. | ||||||
Funders: |
| ||||||
Record Number: | CaltechAUTHORS:20190410-120640737 | ||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20190410-120640737 | ||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||
ID Code: | 94635 | ||||||
Collection: | CaltechAUTHORS | ||||||
Deposited By: | George Porter | ||||||
Deposited On: | 10 Apr 2019 19:52 | ||||||
Last Modified: | 09 Mar 2020 13:19 |
Repository Staff Only: item control page