Quantum Markov Decision Processes Part II: Optimal Solutions and Algorithms

CoRR（2024）

引用 0|浏览0

暂无评分

摘要

This two-part article aims to introduce a quantum analogue to classical Markov decision processes (MDPs). In Part II, building on the formulation of q-MDPs presented in Part I, our focus shifts to the development of algorithms for computing optimal policies and value functions of both open-loop and closed-loop policies. First, by using the duality between the dynamic programming and the semi-definite programming formulations of any q-MDP with open-loop policies, we establish an algorithm that enables us to efficiently compute optimal open-loop quantum policies and value functions. Then, dynamic programming and semi-definite programming formulations for closed-loop policies is established, where duality of these two formulations similarly enables the efficient computation of optimal closed-loop policies and value functions. Finally, given that any q-MDP can be approximated by q-MDPs with classical policies–potentially with higher-dimensional underlying Hilbert spaces than the original model–and since any classical policy is an element of the set of closed-loop policies, we conclude that any q-MDP can be approximated by q-MDPs with closed-loop policies having higher-dimensional Hilbert spaces.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要