AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have proposed a novel approach for explaining what outcomes are implicitly expected by reinforcement learning agents

What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes

NIPS 2020, (2020)

被引用0|浏览8
EI
下载 PDF 全文
引用
微博一下

摘要

We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning...更多

代码

数据

0
简介
  • Explaining the behaviour of machine learning algorithms or AI remains a key challenge in machine learning.
  • The consequences of the agent’s actions are not immediate and a chain of many decisions all contribute to a single desired outcome.
  • This paper addresses this problem by asking what chain of events the agent intended to happen as a result of a particular action choice.
  • Both value functions satisfy the Bellman equation, and are frequently estimated by Q-learning or Monte Carlo (MC) like methods
重点内容
  • Explaining the behaviour of machine learning algorithms or AI remains a key challenge in machine learning
  • Markov decision process (MDP) [30]. This MDP consists of a 5-tuple S, A, p, r, γ, where S is the set of all possible environment states, A is the set of actions, p(st+1|st, at) is the transitional dynamics of the MDP, γ ∈ [0, 1] is the discount factor and r(st, at, st+1) ∈ R is the reward for executing action at in state st arriving at state st+1
  • We have proposed a novel approach for explaining what outcomes are implicitly expected by reinforcement learning agents
  • We proposed a meaningful definition of intention for Reinforcement Learning (RL) agents and proved no post-hoc method could generate such explanations
  • We further showed how it can be extended to deep RL techniques and demonstrated its effectiveness on multiple reinforcement learning problems
  • One potential answer lies in the use of concept activation vectors [16], which allows for shared concepts between humans and machine learning algorithms
方法
  • The authors evaluate the approach on three standard environments using OpenAI Gym [8] - Blackjack, Cartpole [7] and Taxi [11].
  • Each environments poses unique challenges.
  • For each task the authors briefly (c) Outcome belief map for stick.
  • (a) Belief map for stick.
  • (b) Belief map for hit.
  • (d) Outcome belief map for hit.
  • The authors verify the correctness of the implementation in each environment using Theorem 2, and confirm equation (9) holds numerically.
结果
  • The consequences of the agent’s actions are not immediate and a chain of many decisions all contribute to a single desired outcome.
  • This paper addresses this problem by asking what chain of events the agent intended to happen as a result of a particular action choice
  • The importance of such explanations based around intended outcome in day-to-day life is well-known in psychology with Malle [20] estimating that around 70% of these day-to-day explanations are intent-based
结论
  • The authors have proposed a novel approach for explaining what outcomes are implicitly expected by reinforcement learning agents.
  • The authors proposed a meaningful definition of intention for RL agents and proved no post-hoc method could generate such explanations.
  • The authors proposed modifications of standard learning methods that generate such explanations for existing RL approaches, and proved consistency of the approaches with tabular methods.
  • While the approach is directly applicable to problems with mappable state-spaces, it lays the mathematical foundations for explainable agents in more complex systems
相关工作
  • Many approaches address interpretability for supervised machine learning. Traditional model-based machine-learning algorithms, of restricted capacity, such as decision trees [23], GLM/GAMs [14] and RuleFit [12] are considered interpretable or intrinsically explainable due to their simple nature [25]. However, the dominance of model-free algorithms and deep learning means that most approaches are not considered intrinsically explainable due to their high complexity. Unavoidably, a trade-off between interpretability and performance exists, and instead focus has switched to local, post-hoc methods of explanation that give insight into complex classifiers. Two of the largest families of explanations are: (i) perturbation-based attribution methods [24, 19] that systematically alter input features, and examine the changes in the classifier output. This makes it possible to build a local surrogate model that provides an local importance weight for each input feature. (ii) gradient-based attribution methods [28, 29, 6, 27, 26]. Here the attribution of input features is computed in a single forward and backward pass, with the attribution comprising a derivative of output against input. Fundamentally both approaches use measures of feature importance to build a low-complexity model that locally approximates the underlying model. More recently self-explainable neural networks (SENNs) [3, 31, 2] are end-to-end models that produce explanations of their own predictions.
基金
  • This work was partially supported by the UK Engineering and Physical Sciences Research Council (EPSRC), Omidyar Group and The Alan Turing Institute under grant agreements EP/S035761/1 and EP/N510129/1
引用论文
  • 2018 reform of eu data protection rules, 2018. URL https://ec.europa.eu/commission/sites/beta-political/files/data-protection-factsheet-changes_en.pdf.
    Findings
  • Maruan Al-Shedivat, Avinava Dubey, and Eric P. Xing. Contextual explanation networks, 2020.
    Google ScholarFindings
  • David Alvarez-Melis and Tommi S. Jaakkola. Towards robust interpretability with selfexplaining neural networks, 2018.
    Google ScholarFindings
  • Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay, 2018.
    Google ScholarFindings
  • Raghuram Mandyam Annasamy and Katia Sycara. Towards Better Interpretability in Deep Q-Networks. Technical report, 2019. URL www.aaai.org.
    Findings
  • Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10(7):e0130140, 07 2015. doi: 10.1371/journal. pone.0130140. URL http://dx.doi.org/10.1371%2Fjournal.pone.0130140.
    Locate open access versionFindings
  • Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems, page 81–93. IEEE Press, 1990. ISBN 0818620153.
    Google ScholarFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
    Google ScholarLocate open access versionFindings
  • Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993.
    Google ScholarLocate open access versionFindings
  • Nicola De Cao, Ivan Titov, and Wilker Aziz. Block neural autoregressive flow. arXiv preprint arXiv:1904.04676, 2019.
    Findings
  • Thomas G. Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Int. Res., 13(1):227–303, November 2000. ISSN 1076-9757.
    Google ScholarLocate open access versionFindings
  • Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles. Ann. Appl. Stat., 2(3):916–954, 09 2008. doi: 10.1214/07-AOAS148. URL https://doi.org/10.1214/07-AOAS148.
    Locate open access versionFindings
  • Hado V Hasselt. Double q-learning. In Advances in neural information processing systems, pages 2613–2621, 2010.
    Google ScholarLocate open access versionFindings
  • Trevor Hastie and Robert Tibshirani. Generalized additive models. Statist. Sci., 1(3):297–310, 08 1986. doi: 10.1214/ss/1177013604. URL https://doi.org/10.1214/ss/1177013604.
    Locate open access versionFindings
  • Zoe Juozapaitis, Anurag Koul, Alan Fern, Martin Erwig, and Finale Doshi-Velez. Explainable Reinforcement Learning via Reward Decomposition. Technical report, 2019.
    Google ScholarFindings
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279, 2017.
    Findings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
    Google ScholarFindings
  • Pat Langley. Explainable, normative, and justified agency. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9775–9779, 2019.
    Google ScholarLocate open access versionFindings
  • Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
    Locate open access versionFindings
  • B. F. Malle. Folk explanations of intentional action. Foundations of social cognition, 2001.
    Google ScholarFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 00280836. URL http://dx.doi.org/10.1038/nature14236.
    Locate open access versionFindings
  • Alex Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo J Rezende. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents. Technical report, 2019.
    Google ScholarFindings
  • J. R. Quinlan. Induction of decision trees. Mach. Learn., 1(1):81–106, March 1986. ISSN 0885-6125. doi: 10.1023/A:1022643204877. URL https://doi.org/10.1023/A:1022643204877.
    Locate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?" Explaining the Predictions of Any Classifier. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, volume 13-17-Augu, pages 1135–1144, 2016. ISBN 9781450342322. doi: 10.1145/2939672.2939778. URL http://dx.doi.org/10.1145/2939672.2939778.
    Locate open access versionFindings
  • Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
    Google ScholarLocate open access versionFindings
  • W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. Müller. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 28(11):2660–2673, 2017.
    Google ScholarLocate open access versionFindings
  • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. CoRR, abs/1610.02391, 2016. URL http://arxiv.org/abs/1610.02391.
    Findings
  • Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences, 2017.
    Google ScholarFindings
  • Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017.
    Google ScholarFindings
  • Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018. ISBN 0262039249.
    Google ScholarFindings
  • Stefano Teso. Toward faithful explanatory active learning with self-explainable neural nets. 2019.
    Google ScholarFindings
  • Nicholay Topin and Manuela Veloso. Generation of Policy-Level Explanations for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 33:2514–2521, 2019. ISSN 2159-5399. doi: 10.1609/aaai.v33i01.33012514. URL www.aaai.org.
    Locate open access versionFindings
  • Jasper van der Waa, Jurriaan van Diggelen, Karel van den Bosch, and Mark Neerincx. Contrastive explanations for reinforcement learning in terms of expected consequences, 2018.
    Google ScholarFindings
  • Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. Programmatically interpretable reinforcement learning. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 5052–5061. PMLR, 2018. URL http://proceedings.mlr.press/v80/verma18a.html.
    Locate open access versionFindings
作者
Ho Man Herman Yau
Ho Man Herman Yau
Chris Russell
Chris Russell
Simon Hadfield
Simon Hadfield
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科