Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
IEEE Transactions on Information Theory(2020)
摘要
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $\gamma $ -discounted MDP with state space $\mathcal {S}$ 更多
查看译文
关键词
Trajectory,Complexity theory,Markov processes,Analytical models,Steady-state,Reinforcement learning,Licenses
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络