A Caltech Library Service

A Class of Bandit Problems Yielding Myopic Optimal Strategies

Banks, Jeffrey S. and Sundaram, Rangarajan K. (1992) A Class of Bandit Problems Yielding Myopic Optimal Strategies. Journal of Applied Probability, 29 (3). pp. 625-632. ISSN 0021-9002. doi:10.2307/3214899.

[img] PDF - Published Version
See Usage Policy.


Use this Persistent URL to link to this item:


We consider the class of bandit problems in which each of the n ≧ 2 independent arms generates rewards according to one of the same two reward distributions, and discounting is geometric over an infinite horizon. We show that the dynamic allocation index of Gittins and Jones (1974) in this context is strictly increasing in the probability that an arm is the better of the two distributions. It follows as an immediate consequence that myopic strategies are the uniquely optimal strategies in this class of bandit problems, regardless of the value of the discount parameter or the shape of the reward distributions. Some implications of this result for bandits with Bernoulli reward distributions are given.

Item Type:Article
Related URLs:
URLURL TypeDescription
Additional Information:© 1992 Applied Probability Trust. Received 28 August 1990; revision received 8 May 1991. Financial support from the National Science Foundation and the Sloan Foundation to the first author is gratefully acknowledged.
Funding AgencyGrant Number
Alfred P. Sloan FoundationUNSPECIFIED
Issue or Number:3
Record Number:CaltechAUTHORS:20160525-080809749
Persistent URL:
Official Citation:Banks, Jeffrey S., and Sundaram Rangarajan K. "A Class of Bandit Problems Yielding Myopic Optimal Strategies." Journal of Applied Probability 29, no. 3 (1992): 625-32.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:67332
Deposited By: Ruth Sustaita
Deposited On:26 May 2016 20:55
Last Modified:11 Nov 2021 00:30

Repository Staff Only: item control page