FLAIR Seminar: Solving pure exploration problems with the Top Two approach

Abstract

In pure exploration problems for stochastic multi-armed bandits, the goal is to answer a question about a set of unknown distributions (modeling for example the efficacy of a treatment) from which we can collect samples (measure its effect), and to provide guarantees on the candidate answer. The archetypal example is the best arm identification problem, in which the agent aims at identifying the arm with the highest mean. In this talk, I will focus on the class of Top Two algorithms, which select the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and interpretability, Top Two algorithms have received increased attention in recent years. In the fixed-confidence setting, Top Two algorithms have an asymptotically optimal expected sample complexity (number of collected samples when the error level vanishes). In the anytime setting, we propose a Top Two algorithm which has guarantees on the probability of misidentifying a good enough arm at any time.

Date
Feb 19, 2024 1:15 PM
Location
EPFL
Lausanne
Marc Jourdan
Marc Jourdan
PhD Student

PhD student at Scool (Inria), I study identification problems in Multi-Armed Bandits.