"Inference for Batched Bandits"
Susan Murphy, Harvard University
Discussant: Stefan Wager, Stanford University
Abstract: As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. When there is no unique arm we prove that the ordinary least squares estimator(OLS) is not asymptotically normal on data collected using standard bandit algorithms. This is the case even when the bandit is constrained to select each arm with probabilities bounded away from 0 and 1. We show that this problem can be traced to the fact that the arm selection probabilities do not concentrate. We take advantage of the batched setting to develop a Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to nonstationarity in the baseline reward. This is joint work with Kelly Zhang and Lucas Janson.
May 19, 2020
3 окт 2024