Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Xiaoyu Chen,Jiachen Hu,Lihong Li,Liwei Wang

ICLR（2021）

引用 20|浏览240

暂无评分

摘要

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\\citep{osband2014near} by a factor of nH|Si|, where |Si| is the cardinality of the factored state subspace, H is the planning horizon and n is the number of factored transition. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep T, horizon H and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要