
Prof. Bruno Gaujal
Université Grenoble Alpes, France
Title: Mean-Field Control for Restless Bandits with (exponentially fast) asymptotic optimality
Abstarct: We provide a framework to analyse mean field control policies for restless bandits. Under both finite and infinite time horizon., we show that when the population of arms goes to infinity, the value of the optimal control policy converges to the solution of a linear program (LP). We provide necessary and sufficient conditions for a generic control policy to be: i) asymptotically optimal; ii) asymptotically optimal with square root convergence rate; iii) asymptotically optimal with exponential rate. We then construct the LP-index policy that is asymptotically optimal with square root convergence rate on all models, and with exponential rate if the model is non-degenerate in finite horizon, and satisfies a uniform global attractor property in infinite horizon. We next define the LP-update policy, which is essentially a repeated LP-index policy that solves a new linear program at each decision epoch. We provide numerical experiments to compare the efficiency of LP-based policies. We also compare the performance of the LP-index policy and the LP-update policy with other heuristics. Our result demonstrates that the LP-update policy outperforms the LP-index policy in general. Joint work with Nicolas Gast and Chen Yan.
Bio: Bruno Gaujal is an Inria researcher. Till Dec. 2015, he has been the head of the large-scale computing group, MESCAL, in the research center of Inria Grenoble-Rhône-Alpes. He has held several positions in AT&T Bell Labs, INRIA Sophia-Antipolis, Loria and École Normale Supérieure of Lyon. He is a former student of École Normale Supérieure of Lyon and obtained his PhD from University of Nice in 1994, under the supervision of François Baccelli. He got his “Habilitation à diriger des recherches” in 2001 from the university of Nancy. He is the author of more than 100 scientific publications in journals and international conferences. He is a founding partner and a scientific advisor of a start-up company, RTaW, since 2007. His main interests are in performance evaluation, optimization and control of large discrete event dynamic systems with applications to telecommunication networks and large computing infrastructures.