Research interests: Bandit theory Optimistic algorithms (KL-UCB, UCB-V), Thompson sampling, many-armed bandits Foundations of Monte-Carlo Tree Search Optimistic optimization (HOO, SOO, StoSOO), optimistic planning (OP-MDP, OLOP) Bandits in graphs and other structured spaces Reinforcement Learning (RL) Analysis of Reinforcement Learning and Dynamic Programming (DP) with function approximation Finite-sample analysis of RL and DP (Lasso-TD, LSTD, AVI, API, BRM, compressed-LSTD) Policy gradient and sensitivity analysis Sampling methods for MDPs, Bayesian RL, POMDPs Optimal control in continuous time Numerical solutions to HJB equations Stability analysis via viscosity solutions Variable resolution discretizations Statistical learning and randomization Random projections for least squares regression Adaptive sampling for Monte-Carlo integration Active learning and sparse bandits