Denumerable-Armed Bandits
- Creators
- Banks, Jeffrey S.
- Sundaram, Rangarajan K.
Abstract
This paper studies the class of denumerable-armed (i.e. finite- or countably infinitearmed) bandit problems with independent arms and geometric discounting over an infinite horizon, in which each arm generates rewards according to one of a finite number of distributions, or "types." The number of types in the support of an arm, as also the types themselves, are allowed to vary across the arms. We derive certain continuity and curvature properties of the dynamic allocation (or Gittins) index of Gittins and Jones (1974), and provide necessary and sufficient conditions under which the Gittins-Jones result identifying all optimal strategies for finite-armed bandits may be extended to infinite-armed bandits. We then establish our central result: at each point in time, the arm selected by an optimal strategy will, with strictly positive probability, remain an optimal selection forever. More specifically, for every such arm, there exists (at least) one type of that arm such that, when conditioned on that type being the arm's "true" type, the arm will survive forever and continuously with nonzero probability. When the reward distributions of an arm satisfy the monotone likelihood ratio property (MLRP), the survival prospects of an arm improve when conditioned on types generating higher expected rewards; however, we show how this need not be the case in the absence of MLRP. Implications of these results are derived for the theories of job search and matching, as well as other applications of the bandit paradigm.
Additional Information
© 1992 The Econometric Society. This paper has benefited immeasurably from the detailed comments and suggestions offered by Andy McLennan as a (then anonymous) referee. We are also very grateful to Martin Hellwig and two other anonymous referees for their careful reading of earlier drafts and their suggestions; as also to seminar audiences at Buffalo, Caltech, Carnegie-Mellon, Columbia, George Mason, Harvard, Hoover Institution, McMaster, Michigan, Johns Hopkins, Rochester, Toronto, UCLA, UC-Riverside, Washington-St. Louis, Western Ontario, and Yale, for their input. In particular, we would like to thank Prajit Dutta, Mahmoud El-Gamal, Nicholas Kiefer, David Levine, and Bill Zame. The first author gratefully acknowledges financial support provided by the Sloan Foundation and the NSF. The final draft of this paper was completed when the second author was visiting the California Institute of Technology, and he would like to thank them for their hospitality. With transparent modifications, all of our results remain valid if, instead of reward densities, we had discrete reward distributions (i.e., those with finite or countable support).Attached Files
Published - 2951539.pdf
Files
Name | Size | Download all |
---|---|---|
md5:f5b152538dc9a3c5d073e0133e8ba080
|
35.0 MB | Preview Download |
Additional details
- Eprint ID
- 67326
- Resolver ID
- CaltechAUTHORS:20160525-072126405
- Alfred P. Sloan Foundation
- NSF
- Created
-
2016-05-26Created from EPrint's datestamp field
- Updated
-
2021-11-11Created from EPrint's last_modified field