A Caltech Library Service

Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization

Desautels, Thomas and Krause, Andreas and Burdick, Joel W. (2014) Parallelizing Exploration-Exploitation Tradeoffs in Gaussian Process Bandit Optimization. Journal of Machine Learning Research, 15 . pp. 3873-3923. ISSN 1533-7928.

[img] PDF - Published Version
See Usage Policy.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Additionally, observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel. We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size. The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its results, for which the same regret bounds hold. We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner. We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets. These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper
Alternate Title:Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
Additional Information:© 2014 Thomas Desautels, Andreas Krause, and Joel W. Burdick. Submitted 4/13; Revised 6/14; Published 12/14. A previous version of this work appeared in the Proceedings of the 29th International Conference on Machine Learning, 2012. The authors thank Daniel Golovin for helpful discussions, the Edgerton laboratory at UCLA for the SCI data set, and the anonymous reviewers for their careful reading and many helpful suggestions. This work was partially supported by NIH project R01 NS062009, SNSF grant 200021_137971, NSF IIS-0953413, DARPA MSEE FA8650-11-1-7156, ERC StG 307036, a Microsoft Research Faculty Fellowship (AK) and a ThinkSwiss Research Scholarship (TD).
Funding AgencyGrant Number
NIHR01 NS062009
Swiss National Science Foundation (SNSF)200021_137971
European Research Council307036
Microsoft Research Faculty FellowshipUNSPECIFIED
ThinkSwiss Research ScholarshipUNSPECIFIED
Record Number:CaltechAUTHORS:20150611-134819131
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:58201
Deposited By: Tony Diaz
Deposited On:13 Jun 2015 01:33
Last Modified:03 Oct 2019 08:33

Repository Staff Only: item control page