## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification

EMNLP 2020, pp.4211-4221, (2020)

关键词

摘要

Interpretability of predictive models is becoming increasingly important with growing adoption in the real-world. We present RuleNN, a neural network architecture for learning transparent models for sentence classification. The models are in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandabl...更多

代码：

数据：

简介

- Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017)).
- While various techniques for explainability exist (see survey by Guidotti et al (2018)), one popular approach explains predictions from a black-box model by using a surrogate models (Ribeiro et al, 2016)
- Another extracts explanations from neural network layer activations, especially when said activations appeal to human intuition such as attention (Bahdanau et al, 2015) which may be interpreted as importance weights assigned to features derived by the model.
- In other words, is it possible to devise a neural network that directly learns a model expressed in a clear, human-readable dialect?

重点内容

- Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017))
- We show how to extract linguistic expressions (LE) expressed in crisp First-order logic (FOL) from RuleNN post-hoc that may, in turn, be handed to domain experts for verification and even modification, to instill further domain expertise going beyond the available training data
- Our experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew
- RuleNN can be used for any multiple instance learning (MIL) task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs
- It may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017)
- We show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind

方法

- Datasets: The authors experiment with two datasets: TREC (Li and Roth, 2002) comprising questions, and the real-world Contracts data comprising sentences from legal contracts among enterprises.
- Table 2 provides broad-level statistics.
- Sentences in Contracts may be labeled with 0, 1 or more labels, so the authors treat each label as a binary class labeling task.
- Table 3 Label Skew |P| W SoW 0.07 48 DR 0.06 80 IP C P&T 0.10 117 T&T 0.08 77 P&B 0.05 95.
- L (a) Contracts: Label statistics

结论

**Conclusion and Future Work**

The authors' experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew.- RuleNN can be used for any MIL task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs. As an extension, it may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017).
- The authors show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind

总结

## Introduction:

Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017)).- While various techniques for explainability exist (see survey by Guidotti et al (2018)), one popular approach explains predictions from a black-box model by using a surrogate models (Ribeiro et al, 2016)
- Another extracts explanations from neural network layer activations, especially when said activations appeal to human intuition such as attention (Bahdanau et al, 2015) which may be interpreted as importance weights assigned to features derived by the model.
- In other words, is it possible to devise a neural network that directly learns a model expressed in a clear, human-readable dialect?
## Objectives:

Such approaches leave room for improvement because explainability is treated as an after-thought whereas the goal is to treat it as a first-class citizen.## Methods:

Datasets: The authors experiment with two datasets: TREC (Li and Roth, 2002) comprising questions, and the real-world Contracts data comprising sentences from legal contracts among enterprises.- Table 2 provides broad-level statistics.
- Sentences in Contracts may be labeled with 0, 1 or more labels, so the authors treat each label as a binary class labeling task.
- Table 3 Label Skew |P| W SoW 0.07 48 DR 0.06 80 IP C P&T 0.10 117 T&T 0.08 77 P&B 0.05 95.
- L (a) Contracts: Label statistics
## Conclusion:

**Conclusion and Future Work**

The authors' experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew.- RuleNN can be used for any MIL task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs. As an extension, it may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017).
- The authors show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind

- Table1: Notation with description one LE, i.e., the label is assigned if any LE holds true for the sentence, can lead to improved results
- Table2: Broad-level dataset statistics learns k LEs containing up to m PPs each. To handle class skew, i.e., D consists of more negative than positive examples, we utilize negative sampling (<a class="ref-link" id="cMikolov_et+al_2013_a" href="#rMikolov_et+al_2013_a">Mikolov et al, 2013</a>). We also apply dropout (<a class="ref-link" id="cSrivastava_et+al_2014_a" href="#rSrivastava_et+al_2014_a">Srivastava et al, 2014</a>) just before maxpooling to zero-out outputs from randomly chosen CGMs. Once learning has converged, we can use Algorithm 1 to retrieve LEs expressed in FOL. Given α1, . . . αm learned from a single CGM, Algorithm 1 considers each m-combination of predicates from P and returns it as an LE if (Line 4): 1) its associated weight (product of corresponding numbers in αi, ∀i = 1, . . . m) is non-zero, and 2) it evaluates to true on some instance in D. When learning k CGMs, we invoke Algorithm 1 once per CGM and union the LEs. Algorithm 1’s complexity is exponential in m but it is efficient for short LEs which makes sense since longer LEs are hard to interpret. In practice, post-hoc retrieval results in a few hundred LEs (Section 5 discusses how to navigate such a set of LEs)
- Table3: Label Skew |P|. Dataset statistics and AUC-PR results. a) lists the number of predicates constructed using hand-crafted dictionaries for each label following the process described in Section 3. We use TREC’s standard train/test split to aid comparison which also exhibits significant class skew (Table 3 (b)), automatically construct dictionaries by capturing surface forms (from the training set) that discriminate well among its labels and construct predicates by extracting the same syntactic and semantic arguments stated previously. Methods Compared: RuleNN learns k=50 LEs containing up to m=4 predicates. We set

相关工作

- Inductive logic programming (ILP) (Muggleton, 1996) learns rules that perfectly entail the positive examples and reject all negatives. Top-down ILP systems (Muggleton et al, 2008; Corapi et al, 2010; Cropper and Muggleton, 2015) in particular, generate rules before testing them on data. Since a 0-error rule may not exist, noise-tolerant ILP (Muggleton et al, 2018) learns rules that minimize error which is more suited for noisy real-world scenarios. We compare RuleNN against top-down and noisetolerant ILP in Section 5.

Markov logic network (MLN) (Richardson and Domingos, 2006), a member of statistical relational learning (StarAI) (Getoor and Taskar, 2007), comprises weighted rules to extend Markov random fields (Pearl, 1988) to the first-order setting. A long line of work exploring various techniques culminated in the LSM heuristic (Kok and Domingos, 2010) that learns MLN rules before estimating parameters. Since such a stepwise approach can be computationally expensive, BoostSRL (Khot et al, 2011) jointly learns rules and parameters by approximating the gradient using functional gradient boosting (Friedman, 2001). RuleNN replaces logical operations with differentiable functions, thus learning LEs end-to-end without approximations. Section 5 reports results of LSM and BoostSRL.

研究对象与分析

data scientists: 4

5.3 Human-Machine Co-creation: User Study. Having shown that RuleNN learns explainable, high-quality LEs, we were interested in finding out whether domain experts find the same and in particular, whether the interaction improves the LEs? 4 data scientists, with knowledge of NLU and FOL, were given 188 LEs learned for C. The goal was to select LEs whose semantics could be verified

participants: 3

This reduction from 188 LEs translates to a 96% model compression and shows that with human’s expertise, RuleNN’s LEs can be made smaller and thus more interpretable. To model collaborative and iterative development in the realworld, we union LEs produced by each subset of 3 participants to attain 4 explainable models. As Figure 8 (e) shows, 3 of these outperform BiLSTM by ≈ 25% in terms of F-measure (precision and

引用论文

- Jaume Amores. 2013. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence.
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. www.propublica. org/article/machine-bias-riskassessments-in-criminal-sentencing.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
- Luke Bjerring and Eibe Frank. 2011. Beyond trees: Adopting MITI to learn rules and ensemble classifiers for multi-instance data. In International Conference on Advances in Artificial Intelligence.
- BlackBoxNLP. 2019. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
- Hendrik Blockeel, David Page, and Ashwin Srinivasan. 2005. Multi-instance tree learning. In ICML.
- Lingyang Chu, Xia Hu, Juhua Hu, Lanjun Wang, and Jian Pei. 2018. Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. In KDD.
- Domenico Corapi, Alessandra Russo, and Emil Lupu. 2010. Inductive logic programming as abductive search. LIPIcs-Leibniz International Proceedings in Informatics, Vol. 7. Schloss DagstuhlLeibnizZentrum fuer Informatik.
- Andrew Cropper and Stephen H. Muggleton. 2015. Logical minimisation of meta-rules within metainterpretive learning. In ILP.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
- Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics.
- Lise Getoor and Ben Taskar. 2007. Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press.
- Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys.
- Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2020. Neural module networks for reasoning over text. In ICLR.
- Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation.
- Dan Jurafsky and James H. Martin. 2014. Speech and language processing, volume 3. Prentice Hall, Pearson Education International.
- Seyed Mehran Kazemi and David Poole. 2018. Relnn: A deep neural model for relational learning. In AAAI.
- Tushar Khot, Sriraam Natarajan, Kristian Kersting, and Jude Shavlik. 2011. Learning markov logic networks via functional gradient boosting. In ICDM.
- Stanley Kok and Pedro Domingos. 2010. Learning markov logic networks using structural motifs. In ICML.
- Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2008. SystemT: A system for declarative information extraction. ACM SIGMOD Record.
- Legal Categories. https://cloud.ibm.com/docs/services/discovery?topic=discoverycontract_parsing#contract_categories.
- Xin Li and Dan Roth. 2002. Learning question classifiers. In COLING.
- Adam Liptak. 2017. Sent to prison by a software program’s secret algorithms. www.nytimes. com/2017/05/01/us/politics/sent-toprison-by-a-software-programs-secretalgorithms.html.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS.
- Mehrad Moradshahi, Hamid Palangi, Monica S Lam, Paul Smolensky, and Jianfeng Gao. 2019. Hubert untangles bert to improve transfer across nlp tasks. arXiv preprint arXiv:1910.12647.
- Stephen Muggleton. 1996. Learning from positive data. In Worshop on ILP.
- Stephen Muggleton, Wang-Zhou Dai, Claude Sammut, Alireza Tamaddoni-Nezhad, Jing Wen, and Zhi-Hua Zhou. 2018. Meta-interpretive learning from noisy images. Machine Learning.
- Stephen H. Muggleton, Jose Carlos Almeida Santos, and Alireza Tamaddoni-Nezhad. 2008. Toplog: ILP using a logic program declarative bias. In International Conference on Logic Programming.
- Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics.
- Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In EMNLP.
- Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global vectors for word representation. In EMNLP.
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you? explaining the predictions of any classifier. In KDD.
- Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning.
- Tim Rocktaschel and Sebastian Riedel. 2017. End-toend differentiable proving. In NeurIPS.
- Sofia Serrano and Noah A. Smith. 2019. Is attention interpretable? In ACL.
- Parag Singla and Pedro Domingos. 2006. Entity resolution with markov logic. In ICDM.
- Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. 2018. Lifted relational neural networks: Efficient learning of latent relational structures. JAIR.
- Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. In ICLR.
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. JMLR.
- Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition.
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phraselevel sentiment analysis. In EMNLP.
- Fan Yang, Zhilin Yang, and William W Cohen. 2017. Differentiable learning of logical rules for knowledge base reasoning. In NeurIPS.
- Yiwei Yang, Eser Kandogan, Yunyao Li, Walter S. Lasecki, and Prithviraj Sen. 2019. HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop. In ACL.

标签

评论

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn