AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
RuleNN can be used for any multiple instance learning task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from linguistic expressions

Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification

EMNLP 2020, pp.4211-4221, (2020)

被引用0|浏览173
下载 PDF 全文
引用
微博一下

摘要

Interpretability of predictive models is becoming increasingly important with growing adoption in the real-world. We present RuleNN, a neural network architecture for learning transparent models for sentence classification. The models are in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandabl...更多

代码

数据

0
简介
  • Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017)).
  • While various techniques for explainability exist (see survey by Guidotti et al (2018)), one popular approach explains predictions from a black-box model by using a surrogate models (Ribeiro et al, 2016)
  • Another extracts explanations from neural network layer activations, especially when said activations appeal to human intuition such as attention (Bahdanau et al, 2015) which may be interpreted as importance weights assigned to features derived by the model.
  • In other words, is it possible to devise a neural network that directly learns a model expressed in a clear, human-readable dialect?
重点内容
  • Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017))
  • We show how to extract linguistic expressions (LE) expressed in crisp First-order logic (FOL) from RuleNN post-hoc that may, in turn, be handed to domain experts for verification and even modification, to instill further domain expertise going beyond the available training data
  • Our experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew
  • RuleNN can be used for any multiple instance learning (MIL) task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs
  • It may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017)
  • We show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind
方法
  • Datasets: The authors experiment with two datasets: TREC (Li and Roth, 2002) comprising questions, and the real-world Contracts data comprising sentences from legal contracts among enterprises.
  • Table 2 provides broad-level statistics.
  • Sentences in Contracts may be labeled with 0, 1 or more labels, so the authors treat each label as a binary class labeling task.
  • Table 3 Label Skew |P| W SoW 0.07 48 DR 0.06 80 IP C P&T 0.10 117 T&T 0.08 77 P&B 0.05 95.
  • L (a) Contracts: Label statistics
结论
  • Conclusion and Future Work

    The authors' experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew.
  • RuleNN can be used for any MIL task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs. As an extension, it may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017).
  • The authors show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind
总结
  • Introduction:

    Difficult-to-interpret, black-box predictive models have been shown to harbor undesirable biases (e.g., racial bias in computing risk of recidivism among criminals (Angwin et al, 2016; Liptak, 2017)).
  • While various techniques for explainability exist (see survey by Guidotti et al (2018)), one popular approach explains predictions from a black-box model by using a surrogate models (Ribeiro et al, 2016)
  • Another extracts explanations from neural network layer activations, especially when said activations appeal to human intuition such as attention (Bahdanau et al, 2015) which may be interpreted as importance weights assigned to features derived by the model.
  • In other words, is it possible to devise a neural network that directly learns a model expressed in a clear, human-readable dialect?
  • Objectives:

    Such approaches leave room for improvement because explainability is treated as an after-thought whereas the goal is to treat it as a first-class citizen.
  • Methods:

    Datasets: The authors experiment with two datasets: TREC (Li and Roth, 2002) comprising questions, and the real-world Contracts data comprising sentences from legal contracts among enterprises.
  • Table 2 provides broad-level statistics.
  • Sentences in Contracts may be labeled with 0, 1 or more labels, so the authors treat each label as a binary class labeling task.
  • Table 3 Label Skew |P| W SoW 0.07 48 DR 0.06 80 IP C P&T 0.10 117 T&T 0.08 77 P&B 0.05 95.
  • L (a) Contracts: Label statistics
  • Conclusion:

    Conclusion and Future Work

    The authors' experiments indicate that neuro-symbolic RuleNN outperforms other rule induction techniques in terms of efficiency and quality of rules learned even in the presence of challenging conditions such as class skew.
  • RuleNN can be used for any MIL task assuming predicates are given and PGMs can be used to learn combinations of base predicates P even if the structure of the rule differs from LEs. As an extension, it may even be possible to determine the number of LEs k from the data using recurrent neural networks (Yang et al, 2017).
  • The authors show that it is possible to learn human-interpretable models by designing neural networks keeping explainability in mind
表格
  • Table1: Notation with description one LE, i.e., the label is assigned if any LE holds true for the sentence, can lead to improved results
  • Table2: Broad-level dataset statistics learns k LEs containing up to m PPs each. To handle class skew, i.e., D consists of more negative than positive examples, we utilize negative sampling (<a class="ref-link" id="cMikolov_et+al_2013_a" href="#rMikolov_et+al_2013_a">Mikolov et al, 2013</a>). We also apply dropout (<a class="ref-link" id="cSrivastava_et+al_2014_a" href="#rSrivastava_et+al_2014_a">Srivastava et al, 2014</a>) just before maxpooling to zero-out outputs from randomly chosen CGMs. Once learning has converged, we can use Algorithm 1 to retrieve LEs expressed in FOL. Given α1, . . . αm learned from a single CGM, Algorithm 1 considers each m-combination of predicates from P and returns it as an LE if (Line 4): 1) its associated weight (product of corresponding numbers in αi, ∀i = 1, . . . m) is non-zero, and 2) it evaluates to true on some instance in D. When learning k CGMs, we invoke Algorithm 1 once per CGM and union the LEs. Algorithm 1’s complexity is exponential in m but it is efficient for short LEs which makes sense since longer LEs are hard to interpret. In practice, post-hoc retrieval results in a few hundred LEs (Section 5 discusses how to navigate such a set of LEs)
  • Table3: Label Skew |P|. Dataset statistics and AUC-PR results. a) lists the number of predicates constructed using hand-crafted dictionaries for each label following the process described in Section 3. We use TREC’s standard train/test split to aid comparison which also exhibits significant class skew (Table 3 (b)), automatically construct dictionaries by capturing surface forms (from the training set) that discriminate well among its labels and construct predicates by extracting the same syntactic and semantic arguments stated previously. Methods Compared: RuleNN learns k=50 LEs containing up to m=4 predicates. We set
Download tables as Excel
相关工作
  • Inductive logic programming (ILP) (Muggleton, 1996) learns rules that perfectly entail the positive examples and reject all negatives. Top-down ILP systems (Muggleton et al, 2008; Corapi et al, 2010; Cropper and Muggleton, 2015) in particular, generate rules before testing them on data. Since a 0-error rule may not exist, noise-tolerant ILP (Muggleton et al, 2018) learns rules that minimize error which is more suited for noisy real-world scenarios. We compare RuleNN against top-down and noisetolerant ILP in Section 5.

    Markov logic network (MLN) (Richardson and Domingos, 2006), a member of statistical relational learning (StarAI) (Getoor and Taskar, 2007), comprises weighted rules to extend Markov random fields (Pearl, 1988) to the first-order setting. A long line of work exploring various techniques culminated in the LSM heuristic (Kok and Domingos, 2010) that learns MLN rules before estimating parameters. Since such a stepwise approach can be computationally expensive, BoostSRL (Khot et al, 2011) jointly learns rules and parameters by approximating the gradient using functional gradient boosting (Friedman, 2001). RuleNN replaces logical operations with differentiable functions, thus learning LEs end-to-end without approximations. Section 5 reports results of LSM and BoostSRL.
研究对象与分析
data scientists: 4
5.3 Human-Machine Co-creation: User Study. Having shown that RuleNN learns explainable, high-quality LEs, we were interested in finding out whether domain experts find the same and in particular, whether the interaction improves the LEs? 4 data scientists, with knowledge of NLU and FOL, were given 188 LEs learned for C. The goal was to select LEs whose semantics could be verified

participants: 3
This reduction from 188 LEs translates to a 96% model compression and shows that with human’s expertise, RuleNN’s LEs can be made smaller and thus more interpretable. To model collaborative and iterative development in the realworld, we union LEs produced by each subset of 3 participants to attain 4 explainable models. As Figure 8 (e) shows, 3 of these outperform BiLSTM by ≈ 25% in terms of F-measure (precision and

引用论文
  • Jaume Amores. 2013. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence.
    Google ScholarFindings
  • Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. www.propublica. org/article/machine-bias-riskassessments-in-criminal-sentencing.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Luke Bjerring and Eibe Frank. 2011. Beyond trees: Adopting MITI to learn rules and ensemble classifiers for multi-instance data. In International Conference on Advances in Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • BlackBoxNLP. 2019. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
    Google ScholarLocate open access versionFindings
  • Hendrik Blockeel, David Page, and Ashwin Srinivasan. 2005. Multi-instance tree learning. In ICML.
    Google ScholarFindings
  • Lingyang Chu, Xia Hu, Juhua Hu, Lanjun Wang, and Jian Pei. 2018. Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. In KDD.
    Google ScholarFindings
  • Domenico Corapi, Alessandra Russo, and Emil Lupu. 2010. Inductive logic programming as abductive search. LIPIcs-Leibniz International Proceedings in Informatics, Vol. 7. Schloss DagstuhlLeibnizZentrum fuer Informatik.
    Google ScholarLocate open access versionFindings
  • Andrew Cropper and Stephen H. Muggleton. 2015. Logical minimisation of meta-rules within metainterpretive learning. In ILP.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics.
    Google ScholarLocate open access versionFindings
  • Lise Getoor and Ben Taskar. 2007. Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning). The MIT Press.
    Google ScholarFindings
  • Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM Computing Surveys.
    Google ScholarLocate open access versionFindings
  • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2020. Neural module networks for reasoning over text. In ICLR.
    Google ScholarFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation.
    Google ScholarFindings
  • Dan Jurafsky and James H. Martin. 2014. Speech and language processing, volume 3. Prentice Hall, Pearson Education International.
    Google ScholarFindings
  • Seyed Mehran Kazemi and David Poole. 2018. Relnn: A deep neural model for relational learning. In AAAI.
    Google ScholarFindings
  • Tushar Khot, Sriraam Natarajan, Kristian Kersting, and Jude Shavlik. 2011. Learning markov logic networks via functional gradient boosting. In ICDM.
    Google ScholarFindings
  • Stanley Kok and Pedro Domingos. 2010. Learning markov logic networks using structural motifs. In ICML.
    Google ScholarFindings
  • Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2008. SystemT: A system for declarative information extraction. ACM SIGMOD Record.
    Google ScholarLocate open access versionFindings
  • Legal Categories. https://cloud.ibm.com/docs/services/discovery?topic=discoverycontract_parsing#contract_categories.
    Findings
  • Xin Li and Dan Roth. 2002. Learning question classifiers. In COLING.
    Google ScholarFindings
  • Adam Liptak. 2017. Sent to prison by a software program’s secret algorithms. www.nytimes. com/2017/05/01/us/politics/sent-toprison-by-a-software-programs-secretalgorithms.html.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS.
    Google ScholarFindings
  • Mehrad Moradshahi, Hamid Palangi, Monica S Lam, Paul Smolensky, and Jianfeng Gao. 2019. Hubert untangles bert to improve transfer across nlp tasks. arXiv preprint arXiv:1910.12647.
    Findings
  • Stephen Muggleton. 1996. Learning from positive data. In Worshop on ILP.
    Google ScholarFindings
  • Stephen Muggleton, Wang-Zhou Dai, Claude Sammut, Alireza Tamaddoni-Nezhad, Jing Wen, and Zhi-Hua Zhou. 2018. Meta-interpretive learning from noisy images. Machine Learning.
    Google ScholarFindings
  • Stephen H. Muggleton, Jose Carlos Almeida Santos, and Alireza Tamaddoni-Nezhad. 2008. Toplog: ILP using a logic program declarative bias. In International Conference on Logic Programming.
    Google ScholarLocate open access versionFindings
  • Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics.
    Google ScholarFindings
  • Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In EMNLP.
    Google ScholarFindings
  • Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
    Google ScholarFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global vectors for word representation. In EMNLP.
    Google ScholarFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you? explaining the predictions of any classifier. In KDD.
    Google ScholarFindings
  • Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning.
    Google ScholarFindings
  • Tim Rocktaschel and Sebastian Riedel. 2017. End-toend differentiable proving. In NeurIPS.
    Google ScholarFindings
  • Sofia Serrano and Noah A. Smith. 2019. Is attention interpretable? In ACL.
    Google ScholarLocate open access versionFindings
  • Parag Singla and Pedro Domingos. 2006. Entity resolution with markov logic. In ICDM.
    Google ScholarFindings
  • Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. 2018. Lifted relational neural networks: Efficient learning of latent relational structures. JAIR.
    Google ScholarLocate open access versionFindings
  • Akash Srivastava and Charles Sutton. 2017. Autoencoding variational inference for topic models. In ICLR.
    Google ScholarFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. JMLR.
    Google ScholarLocate open access versionFindings
  • Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition.
    Google ScholarFindings
  • Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phraselevel sentiment analysis. In EMNLP.
    Google ScholarFindings
  • Fan Yang, Zhilin Yang, and William W Cohen. 2017. Differentiable learning of logical rules for knowledge base reasoning. In NeurIPS.
    Google ScholarFindings
  • Yiwei Yang, Eser Kandogan, Yunyao Li, Walter S. Lasecki, and Prithviraj Sen. 2019. HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop. In ACL.
    Google ScholarFindings
作者
Marina Danilevsky
Marina Danilevsky
Yunyao Li
Yunyao Li
Siddhartha Brahma
Siddhartha Brahma
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科