On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

Mudit Gaur,Vaneet Aggarwal,Mridul Agarwal

arxiv（2023）

引用 0|浏览6

暂无评分

摘要

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

查看译文

关键词

network parametrization,global convergence,q-iteration,two-layer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要