A Caltech Library Service

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Voloshin, Cameron and Verma, Abhinav and Yue, Yisong (2023) Eventual Discounting Temporal Logic Counterfactual Experience Replay. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Verma, Abhinav0000-0002-9820-8285
Yue, Yisong0000-0001-9127-1989
Additional Information:Attribution 4.0 International (CC BY 4.0)
Record Number:CaltechAUTHORS:20230316-204049328
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120107
Deposited By: George Porter
Deposited On:17 Mar 2023 00:37
Last Modified:17 Mar 2023 00:37

Repository Staff Only: item control page