Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation

arxiv(2024)

引用 0|浏览0
暂无评分
摘要
In recent years, there has been a lot of research activity focused on carrying out non-asymptotic convergence analyses for actor-critic algorithms. Recently a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the timescales of the actor and the critic are reversed and only asymptotic convergence shown. In our work, we present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting and present the first finite-time non-asymptotic as well as asymptotic convergence analysis for such a scheme. We obtain optimal learning rates and prove that our algorithm achieves a sample complexity of Õ(ϵ^-2.08) for the mean squared error of the critic to be upper bounded by ϵ which is better than the one obtained for two-timescale actor-critic in a similar setting. A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete asymptotic convergence analysis of our scheme in addition to the finite-time bounds that we obtain and show that the (slower) critic recursion converges asymptotically to the attractor of an associated differential inclusion with actor parameters corresponding to local maxima of a perturbed average reward objective. We also show the results of numerical experiments on three benchmark settings and observe that our critic-actor algorithm performs on par and is in fact better than the other algorithms considered.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要