Voloshin, Cameron and Verma, Abhinav and Yue, Yisong (2023) Eventual Discounting Temporal Logic Counterfactual Experience Replay. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20230316-204049328
![]() |
PDF
- Submitted Version
Creative Commons Attribution. 4MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230316-204049328
Abstract
Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.
Item Type: | Report or Paper (Discussion Paper) | ||||||
---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||
ORCID: |
| ||||||
Additional Information: | Attribution 4.0 International (CC BY 4.0) | ||||||
Record Number: | CaltechAUTHORS:20230316-204049328 | ||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20230316-204049328 | ||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||
ID Code: | 120107 | ||||||
Collection: | CaltechAUTHORS | ||||||
Deposited By: | George Porter | ||||||
Deposited On: | 17 Mar 2023 00:37 | ||||||
Last Modified: | 17 Mar 2023 00:37 |
Repository Staff Only: item control page