Published February 23, 2020 | Version Submitted
Discussion Paper Open

On Thompson Sampling with Langevin Algorithms

Abstract

Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

Attached Files

Submitted - 2002.10002.pdf

Files

2002.10002.pdf

Files (993.4 kB)

Name Size Download all
md5:ecf93964398318c03bb049e54025f627
993.4 kB Preview Download

Additional details

Identifiers

Eprint ID
110727
Resolver ID
CaltechAUTHORS:20210903-220351518

Related works

Dates

Created
2021-09-07
Created from EPrint's datestamp field
Updated
2023-06-02
Created from EPrint's last_modified field