Towards Behavior-Level Explanation for Deep Reinforcement Learning

Xuan Chen,Zifan Wang,Yucai Fan,Bonan Jin,Piotr Mardziel,Carlee Joe-Wong,Anupam Datta

CoRR（2020）

引用 1|浏览25

暂无评分

摘要

While Deep Neural Networks (DNNs) are becoming the state-of-the-art for many tasks including reinforcement learning (RL), they are especially resistant to human scrutiny and understanding. Input attributions have been a foundational building block for DNN expalainabilty but face new challenges when applied to deep RL. We address the challenges with two novel techniques. We define a class of \emph{behaviour-level attributions} for explaining agent behaviour beyond input importance and interpret existing attribution methods on the behaviour level. We then introduce \emph{$\lambda$-alignment}, a metric for evaluating the performance of behaviour-level attributions methods in terms of whether they are indicative of the agent actions they are meant to explain. Our experiments on Atari games suggest that perturbation-based attribution methods are significantly more suitable to deep RL than alternatives from the perspective of this metric. We argue that our methods demonstrate the minimal set of considerations for adopting general DNN explanation technology to the unique aspects of reinforcement learning and hope the outlined direction can serve as a basis for future research on understanding Deep RL using attribution.

查看译文

关键词

deep reinforcement learning,behavior-level

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要