Batched Bandit Problems
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
© 2015 V. Perchet, P. Rigollet, S. Chassang & E. Snowberg. Philippe Rigollet acknowledges the support of NSF grants DMS-1317308, CAREER-DMS-1053987, the Howard B. Wentz Jr. Junior Faculty award and the Meimaris family. Vianney Perchet received support from the French National Research Agency (ANR) Project GAGA: ANR-13-JS01-0004-01. Sylvain Chassang and Erik Snowberg acknowledge support from NSF grant SES-1156154.
Published - Perchet15.pdf