On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency
arxiv(2024)
摘要
Identifying the trade-offs between model-based and model-free methods is a
central question in reinforcement learning. Value-based methods offer
substantial computational advantages and are sometimes just as statistically
efficient as model-based methods. However, focusing on the core problem of
policy evaluation, we show information about the transition dynamics may be
impossible to represent in the space of value functions. We explore this
through a series of case studies focused on structures that arises in many
important problems. In several, there is no information loss and value-based
methods are as statistically efficient as model based ones. In other
closely-related examples, information loss is severe and value-based methods
are severely outperformed. A deeper investigation points to the limitations of
the representational power as the driver of the inefficiency, as opposed to
failure in algorithm design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要