Fairness in Reinforcement Learning.
ICML(2017)
摘要
We initiate the study of fair reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. We define a fairness constraint requiring that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take exponentially many rounds in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络