Regularized Q-Learning with Linear Function Approximation
CoRR(2024)
摘要
Several successful reinforcement learning algorithms make use of
regularization to promote multi-modal policies that exhibit enhanced
exploration and robustness. With functional approximation, the convergence
properties of some of these algorithms (e.g. soft Q-learning) are not well
understood. In this paper, we consider a single-loop algorithm for minimizing
the projected Bellman error with finite time convergence guarantees in the case
of linear function approximation. The algorithm operates on two scales: a
slower scale for updating the target network of the state-action values, and a
faster scale for approximating the Bellman backups in the subspace of the span
of basis vectors. We show that, under certain assumptions, the proposed
algorithm converges to a stationary point in the presence of Markovian noise.
In addition, we provide a performance guarantee for the policies derived from
the proposed algorithm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要