Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem
We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 2007; Radlinski et al., 2008b). We will present an algorithm with theoretical guarantees as well as simulation results.
Copyright 2009 by the author(s)/owner(s). The work was funded under NSF Award IIS-0713483, NSF CAREER Award 0237381, and a gift from Yahoo! Research. The first author is also partly funded by a Microsoft Research Graduate Fellowship and a Yahoo! Key Technical Challenges Grant. The authors also thank Robert Kleinberg, Josef Broder and the anonymous reviewers for their helpful comments.