Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation
arxiv(2024)
摘要
In recent years, there has been a lot of research activity focused on
carrying out non-asymptotic convergence analyses for actor-critic algorithms.
Recently a two-timescale critic-actor algorithm has been presented for the
discounted cost setting in the look-up table case where the timescales of the
actor and the critic are reversed and only asymptotic convergence shown. In our
work, we present the first two-timescale critic-actor algorithm with function
approximation in the long-run average reward setting and present the first
finite-time non-asymptotic as well as asymptotic convergence analysis for such
a scheme. We obtain optimal learning rates and prove that our algorithm
achieves a sample complexity of Õ(ϵ^-2.08) for the
mean squared error of the critic to be upper bounded by ϵ which is
better than the one obtained for two-timescale actor-critic in a similar
setting. A notable feature of our analysis is that unlike recent
single-timescale actor-critic algorithms, we present a complete asymptotic
convergence analysis of our scheme in addition to the finite-time bounds that
we obtain and show that the (slower) critic recursion converges asymptotically
to the attractor of an associated differential inclusion with actor parameters
corresponding to local maxima of a perturbed average reward objective. We also
show the results of numerical experiments on three benchmark settings and
observe that our critic-actor algorithm performs on par and is in fact better
than the other algorithms considered.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要