Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
arxiv(2023)
摘要
Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both
powerful tools for making decisions in the presence of uncertainties. Previous
efforts have aimed to establish their connections, revealing equivalences in
specific formulations. This paper introduces a new formulation for
risk-sensitive MDPs, which assesses risk in a slightly different manner
compared to the classical Markov risk measure (Ruszczyński 2010), and
establishes its equivalence with a class of soft robust MDP (RMDP) problems,
including the standard RMDP as a special case. Leveraging this equivalence, we
further derive the policy gradient theorem for both problems, proving gradient
domination and global convergence of the exact policy gradient method under the
tabular setting with direct parameterization. This forms a sharp contrast to
the Markov risk measure, known to be potentially non-gradient-dominant (Huang
et al. 2021). We also propose a sample-based offline learning algorithm, namely
the robust fitted-Z iteration (RFZI), for a specific soft RMDP problem with a
KL-divergence regularization term (or equivalently the risk-sensitive MDP with
an entropy risk measure). We showcase its streamlined design and less stringent
assumptions due to the equivalence and analyze its sample complexity
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要