# Uncertainty Quantification and Exploration for Reinforcement Learning

Operations Research（2023）

Abstract

Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.

MoreTranslated text

Key words

uncertainty,exploration,quantification,reinforcement,learning

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined