AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The claim of the research should match the type of the evaluation

Towards A Rigorous Science of Interpretable Machine Learning

arXiv: Machine Learning, (2017)

引用918|浏览537
下载 PDF 全文
引用
微博一下

摘要

As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is ...更多

代码

数据

0
简介
  • From autonomous cars and adaptive email-filters to predictive policing systems, machine learning (ML) systems are increasingly ubiquitous; they outperform humans on specific tasks [Mnih et al, 2013, Silver et al, 2016, Hamill, 2017] and often guide processes of human understanding and decisions [Carton et al, 2016, Doshi-Velez et al, 2014].
  • The authors might not be able to enumerate all unit tests required for the safe operation of a semi-autonomous car or all confounds that might cause a credit scoring system to be discriminatory
  • In such cases, a popular fallback is the criterion of interpretability: if the system can explain its reasoning, the authors can verify whether that reasoning is sound with respect to these auxiliary criteria.
  • The second evaluates interpretability via a quantifiable proxy: a researcher might first claim that some model class—e.g. sparse linear models, rule lists, gradient boosted trees—are interpretable and present algorithms to optimize within that class (e.g. Bucilu et al [2006], Wang et al [2017], Wang and Rudin [2015], Lou et al [2012])
重点内容
  • From autonomous cars and adaptive email-filters to predictive policing systems, machine learning (ML) systems are increasingly ubiquitous; they outperform humans on specific tasks [Mnih et al, 2013, Silver et al, 2016, Hamill, 2017] and often guide processes of human understanding and decisions [Carton et al, 2016, Doshi-Velez et al, 2014]
  • The second evaluates interpretability via a quantifiable proxy: a researcher might first claim that some model class—e.g. sparse linear models, rule lists, gradient boosted trees—are interpretable and present algorithms to optimize within that class (e.g. Bucilu et al [2006], Wang et al [2017], Wang and Rudin [2015], Lou et al [2012])
  • The claim of the research should match the type of the evaluation
  • Just as one would be critical of a reliability-oriented paper that only cites accuracy statistics, the choice of evaluation should match the specificity of the claim being made
  • A contribution that is focused on a particular application should be expected to be evaluated in the context of that application, or on a human experiment with a closely-related task
  • A contribution that is focused on better optimizing a model class for some definition of interpretability should be expected to be evaluated with functionally-grounded metrics
结论
  • Recommendations for Researchers

    In this work, the authors have laid the groundwork for a process to rigorously define and evaluate interpretability.
  • A contribution that is focused on a particular application should be expected to be evaluated in the context of that application, or on a human experiment with a closely-related task.
  • A contribution that is focused on better optimizing a model class for some definition of interpretability should be expected to be evaluated with functionally-grounded metrics.
  • The authors must be careful in the work on interpretability, both recognizing the need for and the costs of human-subject experiments
基金
  • Proposes a taxonomy for the evaluation of interpretability—application-grounded, human-grounded and functionallygrounded
  • Proposes data-driven ways to derive operational definitions and evaluations of explanations, and interpretability
  • Argues that interpretability can assist in qualitatively ascertaining whether other desiderata—such as fairness, privacy, reliability, robustness, causality, usability and trust—are met
  • Argues that the need for interpretability stems from an incompleteness in the problem formalization, creating a fundamental barrier to optimization and evaluation
引用论文
  • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mane. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
    Findings
  • Pedro Antunes, Valeria Herskovic, Sergio F Ochoa, and Jose A Pino. Structuring dimensions for collaborative systems evaluation. ACM Computing Surveys, 2012.
    Google ScholarLocate open access versionFindings
  • William Bechtel and Adele Abrahamsen. Explanation: A mechanist alternative. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 2005.
    Google ScholarLocate open access versionFindings
  • Catherine Blake and Christopher J Merz. {UCI} repository of machine learning databases. 1998.
    Google ScholarFindings
  • Nick Bostrom and Eliezer Yudkowsky. The ethics of artificial intelligence. The Cambridge Handbook of Artificial Intelligence, 2014.
    Google ScholarLocate open access versionFindings
  • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
    Findings
  • Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.
    Google ScholarLocate open access versionFindings
  • Samuel Carton, Jennifer Helsby, Kenneth Joseph, Ayesha Mahmud, Youngsoo Park, Joe Walsh, Crystal Cody, CPT Estella Patterson, Lauren Haynes, and Rayid Ghani. Identifying police officers at risk of adverse events. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Jonathan Chang, Jordan L Boyd-Graber, Sean Gerrish, Chong Wang, and David M Blei. Reading tea leaves: How humans interpret topic models. In NIPS, 2009.
    Google ScholarLocate open access versionFindings
  • Nick Chater and Mike Oaksford. Speculations on human causal learning and reasoning. Information sampling and adaptive cognition, 2006.
    Google ScholarLocate open access versionFindings
  • Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics, 133(1):e54–e63, 2014.
    Google ScholarLocate open access versionFindings
  • Finale Doshi-Velez, Byron Wallace, and Ryan Adams. Graph-sparse lda: a topic model with structured sparsity. Association for the Advancement of Artificial Intelligence, 2015.
    Google ScholarFindings
  • Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science Conference. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • Alex Freitas. Comprehensible classification models: a position paper. ACM SIGKDD Explorations, 2014.
    Google ScholarLocate open access versionFindings
  • Vikas K Garg and Adam Tauman Kalai. Meta-unsupervised-learning: A supervised approach to unsupervised learning. arXiv preprint arXiv:1612.09030, 2016.
    Findings
  • Stuart Glennan. Rethinking mechanistic explanation. Philosophy of science, 2002.
    Google ScholarLocate open access versionFindings
  • Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a” right to explanation”. arXiv preprint arXiv:1606.08813, 2016.
    Findings
  • Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van Esbroeck. Monotonic calibrated interpolated look-up tables. Journal of Machine Learning Research, 2016.
    Google ScholarLocate open access versionFindings
  • http://www.post-gazette.com/business/tech-news/2017/01/31/CMU-computerwon-poker-battle-over-humans-by-statistically-significant-margin/stories/
    Findings
  • 201701310250, 2017. Accessed: 2017-02-07.
    Google ScholarFindings
  • Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In ACM Symposium on Theory of Computing. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, 2016.
    Google ScholarLocate open access versionFindings
  • Carl Hempel and Paul Oppenheim. Studies in the logic of explanation. Philosophy of science, 1948.
    Google ScholarLocate open access versionFindings
  • Tin Kam Ho and Mitra Basu. Complexity measures of supervised classification problems. IEEE transactions on pattern analysis and machine intelligence, 2002.
    Google ScholarLocate open access versionFindings
  • Frank Keil. Explanation and understanding. Annu. Rev. Psychol., 2006.
    Google ScholarLocate open access versionFindings
  • Frank Keil, Leonid Rozenblit, and Candice Mills. What lies beneath? understanding the limits of understanding. Thinking and seeing: Visual metacognition in adults and children, 2004.
    Google ScholarFindings
  • Been Kim, Caleb Chacha, and Julie Shah. Inferring robot task plans from human team meetings: A generative modeling approach with logic-based prior. Association for the Advancement of Artificial Intelligence, 2013.
    Google ScholarFindings
  • Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. iBCM: Interactive bayesian case model empowering humans via intuitive interaction. 2015a.
    Google ScholarFindings
  • Been Kim, Julie Shah, and Finale Doshi-Velez. Mind the gap: A generative approach to interpretable feature selection and extraction. In Advances in Neural Information Processing Systems, 2015b.
    Google ScholarLocate open access versionFindings
  • Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1675–1684. ACM, 2016.
    Google ScholarLocate open access versionFindings
  • Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. Research methods in human-computer interaction. John Wiley & Sons, 2010.
    Google ScholarFindings
  • Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016.
    Findings
  • Tania Lombrozo. The structure and function of explanations. Trends in cognitive sciences, 10(10): 464–470, 2006.
    Google ScholarLocate open access versionFindings
  • Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classification and regression. In ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
    Findings
  • Ian Neath and Aimee Surprenant. Human Memory. 2003.
    Google ScholarFindings
  • Clemens Otte. Safe and interpretable machine learning: A methodological review. In Computational Intelligence in Intelligent Data Analysis. Springer, 2013.
    Google ScholarFindings
  • Parliament and Council of the European Union. General data protection regulation. 2016.
    Google ScholarFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “why should i trust you?”: Explaining the predictions of any classifier. arXiv preprint arXiv:1602.04938, 2016.
    Findings
  • Salvatore Ruggieri, Dino Pedreschi, and Franco Turini. Data mining for discrimination discovery. ACM Transactions on Knowledge Discovery from Data, 2010.
    Google ScholarLocate open access versionFindings
  • Eric Schulz, Joshua Tenenbaum, David Duvenaud, Maarten Speekenbrink, and Samuel Gershman. Compositional inductive biases in function learning. bioRxiv, 2016.
    Google ScholarLocate open access versionFindings
  • D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems, 2015.
    Google ScholarLocate open access versionFindings
  • David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016.
    Google ScholarLocate open access versionFindings
  • Lior Jacob Strahilevitz. Privacy versus antidiscrimination. University of Chicago Law School Working Paper, 2008.
    Google ScholarFindings
  • Adi Suissa-Peleg, Daniel Haehn, Seymour Knowles-Barley, Verena Kaynig, Thouis R Jones, Alyssa Wilson, Richard Schalek, Jeffery W Lichtman, and Hanspeter Pfister. Automatic neural reconstruction from petavoxel of electron microscopy data. Microscopy and Microanalysis, 2016.
    Google ScholarLocate open access versionFindings
  • Vincent Toubiana, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon Barocas. Adnostic: Privacy preserving targeted advertising. 2010.
    Google ScholarFindings
  • Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl, and Luis Torgo. Openml: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014.
    Google ScholarLocate open access versionFindings
  • Kush Varshney and Homa Alemzadeh. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. CoRR, 2016.
    Google ScholarLocate open access versionFindings
  • Fulton Wang and Cynthia Rudin. Falling rule lists. In AISTATS, 2015. Tong Wang, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry MacNeille.
    Google ScholarFindings
  • Bayesian rule sets for interpretable classification. In International Conference on Data Mining, 2017. Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z Gajos, Walter S Lasecki, and Neil Heffernan. Axis: Generating explanations at scale with learnersourcing and machine learning. In ACM Conference on Learning@ Scale. ACM, 2016. Andrew Wilson, Christoph Dann, Chris Lucas, and Eric Xing. The human kernel. In Advances in Neural Information Processing Systems, 2015.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn