On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks
UAI(2024)
摘要
In this paper, we study the expressivity of scalar, Markovian reward
functions in Reinforcement Learning (RL), and identify several limitations to
what they can express. Specifically, we look at three classes of RL tasks;
multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive
necessary and sufficient conditions that describe when a problem in this class
can be expressed using a scalar, Markovian reward. Moreover, we find that
scalar, Markovian rewards are unable to express most of the instances in each
of these three classes. We thereby contribute to a more complete understanding
of what standard reward functions can and cannot express. In addition to this,
we also call attention to modal problems as a new class of problems, since they
have so far not been given any systematic treatment in the RL literature. We
also briefly outline some approaches for solving some of the problems we
discuss, by means of bespoke RL algorithms.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要