AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We presented CONJNLI, a new stress-test dataset for Natural Language Inference in conjunctive sentences in the presence of negations and quantifiers and requiring diverse “boolean” and “nonboolean” inferences over conjuncts

ConjNLI: Natural Language Inference Over Conjunctive Sentences

EMNLP 2020, pp.8240-8252, (2020)

Cited by: 3|Views437
Full Text
Bibtex
Weibo

Abstract

Reasoning about conjuncts in conjunctive sentences is important for a deeper understanding of conjunctions in English and also how their usages and semantics differ from conjunctive and disjunctive boolean logic. Existing NLI stress tests do not consider non-boolean usages of conjunctions and use templates for testing such model knowledge...More

Code:

Data:

0
Introduction
  • Coordinating conjunctions are a common syntactic phenomenon in English: 38.8% of sentences in the Penn Tree Bank have at least one coordinating word between “and”, “or”, and “but” (Marcus et al, 1993).
  • Recent years have seen significant progress in the task of Natural Language Inference (NLI) through the development of large-scale datasets like SNLI (Bowman et al, 2015) and MNLI (Williams et al, 2018).
  • Inference over conjuncts directly translates to boolean and non-
Highlights
  • Coordinating conjunctions are a common syntactic phenomenon in English: 38.8% of sentences in the Penn Tree Bank have at least one coordinating word between “and”, “or”, and “but” (Marcus et al, 1993)
  • Natural Language Inference (NLI) pairs, we propose a predicate-aware RoBERTa model, built on top of a standard RoBERTa model for NLI
  • We observe a similar trend for both MNLI and CONJNLI, with MNLI-trained RoBERTa being the best performing model. This is perhaps unsurprising as MNLI contains more complex inference examples compared to SNLI
  • We experimented with older models like ESIM (Chen et al, 2017) and the accuracy on CONJNLI was much worse at 53.10%
  • We presented CONJNLI, a new stress-test dataset for NLI in conjunctive sentences (“and”, “or”, “but”, “nor”) in the presence of negations and quantifiers and requiring diverse “boolean” and “nonboolean” inferences over conjuncts
  • The accuracies for RoBERTa, Iterative Adversarial Fine-Tuning (IAFT) and Predicate-aware RoBERTa (PA) on the boolean subset are 68%, 72% and 69% respectively, while on the non-boolean subset, these are 58%, 61% and 58% respectively. Based on these results, we make some key observations: (1) Non-boolean accuracy for all models are about 10% less than the boolean counterpart, revealing the hardness of the dataset, (2) IAFT improves both boolean and non-boolean subsets because of the non-boolean heuristics used in creating its adversarial training data, (3) PA only marginally improves the boolean subset, suggesting the need for better semantic models in future work
  • We presented some initial solutions via adversarial training and a predicate-aware RoBERTa model, and achieved some reasonable performance gains on CONJNLI
Methods
  • Adversarial Methods in NLP

    Adversarial training for robustifying neural models has been

    Conjunctive Sentence Selection

    Conjuncts Identification

    NLI Pair Creation

    Manual Validation + Expert Annotation “He is a Worcester resident and a member of the Democratic Family.”

    “a Worcester resident”,

    (“He is a Worcester resident and a member of “a member of the Democratic Family”

    the Democratic Family.”,

    “He is a member of the Democratic Family.”)

    Entailment proposed in many NLP tasks, most notably in QA (Jia and Liang, 2017; Wang and Bansal, 2018) and NLI (Nie et al, 2019).
  • The authors first try to automatically create some training data to train models for the challenging CONJNLI stress-test and show the limits of such rule-based adversarial training methods
  • For this automated training data creation, the authors follow the same process as Section 3 but replace the expert humanannotation phase with automated boolean rules and some initial heuristics for non-boolean4 semantics so as to assign labels to these pairs automatically.
Results
  • The authors perform experiments on three datasets - (1) CONJNLI, (2) SNLI (Bowman et al, 2015) and (3) MNLI (Williams et al, 2018).
  • The authors observe a similar trend for both MNLI and CONJNLI, with MNLI-trained RoBERTa being the best performing model.
  • This is perhaps unsurprising as MNLI contains more complex inference examples compared to SNLI.
  • All the successive experiments are conducted using RoBERTa with MNLI as the base training data, owing to its superior performance
Conclusion
  • The authors presented CONJNLI, a new stress-test dataset for NLI in conjunctive sentences (“and”, “or”, “but”, “nor”) in the presence of negations and quantifiers and requiring diverse “boolean” and “nonboolean” inferences over conjuncts.
  • The authors presented some initial solutions via adversarial training and a predicate-aware RoBERTa model, and achieved some reasonable performance gains on CONJNLI.
  • The authors show limitations of the proposed methods, thereby encouraging future work on CONJNLI for better understanding of conjunctive semantics
Tables
  • Table1: Examples from our CONJNLI dataset consisting of single and multiple occurrences of different coordinating conjunctions (and, or, but), boolean or non-boolean in the presence of negations and quantifiers. Typical SNLI and MNLI examples do not require inference over conjuncts. † = Non-boolean usages of different conjunctions
  • Table2: Dataset splits of CONJNLI
  • Table3: Data analysis by conjunction types, presence of quantifiers and negations
  • Table4: CONJNLI sentences consist of varied syntactic conjunct categories (bolded). CT = Conjunct Types, NP = Noun Phrase, VP = Verb Phrase, AdvP = Adverbial Phrase
  • Table5: Two examples from CONJNLI where SRL tags can help the model predict the correct label
  • Table6: Comparison of BERT and RoBERTa trained on SNLI and MNLI and tested on respective dev sets and CONJNLI. MNLI Dev (MD) results are in match/mismatched format. SD = SNLI Dev, CD = Conj Dev, CT = Conj Test
  • Table7: Table showing the effectiveness of IAFT over AFT and other baseline models
  • Table8: Comparison of all our final models on CONJNLI and MNLI
  • Table9: Comparison of all models on the subset of each conjunction type of CONJNLI
  • Table10: Some examples from CONJNLI with gold labels and explanations, used for training the annotators
  • Table11: Examples from CONJNLI showing where each model is good at and what is still challenging for all
Download tables as Excel
Related work
  • Our work is positioned at the intersection of understanding the semantics of conjunctions in English and its association to NLI.

    Conjunctions in English. There is a long history of analyzing the nuances of coordinating conjunctions in English and how these compare to boolean and non-boolean semantics (Gleitman, 1965; Keenan and Faltz, 2012). Linguistic studies have shown that noun phrase conjuncts in “and” do not always behave in a boolean manner (Massey, 1976; Hoeksema, 1988; Krifka, 1990). In the NLP community, studies on conjunctions have mostly been limited to treating it as a syntactic phenomenon. One of the popular tasks is that of conjunct boundary identification (Agarwal and Boggess, 1992). Ficler and Goldberg (2016a) show that state-of-the-art parsers often make mistakes in identifying conjuncts correctly and develop neural models to accomplish this (Ficler and Goldberg, 2016b; Teranishi et al, 2019). Saha and Mausam (2018) also identify conjuncts to break conjunctive sentences into simple ones for better downstream Open IE (Banko et al, 2007). However, we study the semantics of conjunctions through our challenging dataset for NLI.
Funding
  • This work was supported by DARPA MCS Grant N66001-192-4031, NSF-CAREER Award 1846185, DARPA KAIROS Grant FA8750-19-2-1004, and Munroe & Rebecca Cobey Fellowship
  • The views are those of the authors and not the funding agency
Study subjects and analysis
men and women: 5
Although SNLI has 30% of samples with conjunctions, most of these examples do not require inferences over the conjuncts that are connected by the coordinating word. On a random sample of 100 conjunctive examples from SNLI, we find that 72% of them have the conjuncts unchanged between the premise and the hypothesis (e.g., “Man and woman sitting on the sidewalk” → “Man and woman are sitting”) and there are almost no examples with non-boolean conjunctions (e.g., “A total of five men and women are sitting.” → “A total of 5 men are sitting.” (contradiction)). As discussed below, inference over conjuncts directly translates to boolean and non-

people: 3185
Second, “non-boolean and” is prevalent in sentences where the conjunct entities together map onto a collective entity and often in the presence of certain trigger words like “total”, “group”, “combined”, etc. (but note that this is not always true). For example, removing the conjunct “flooding” in the sentence “In total, the flooding and landslides killed 3,185 people in China.” should lead to contradiction. We look for such trigger words in the sentence and heuristically assign contradiction label to the pair

datasets: 3
6 Experiments and Results. We perform experiments on three datasets - (1) CONJNLI, (2) SNLI (Bowman et al, 2015) and (3) MNLI (Williams et al, 2018). The appendix contains details about our experimental setup

Reference
  • Rajeev Agarwal and Lois Boggess. 1992. A simple but useful approach to conjunct identification. In Proceedings of the 30th annual meeting on Association for Computational Linguistics, pages 15–21.
    Google ScholarLocate open access versionFindings
  • Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI, volume 7, pages 2670–2676.
    Google ScholarLocate open access versionFindings
  • Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642.
    Google ScholarLocate open access versionFindings
  • Xavier Carreras and Lluıs Marquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL ’05.
    Google ScholarLocate open access versionFindings
  • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1657–1668.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Jessica Ficler and Yoav Goldberg. 2016a. Coordination annotation extension in the Penn Tree Bank. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 834–842.
    Google ScholarLocate open access versionFindings
  • Jessica Ficler and Yoav Goldberg. 2016b. A neural network for coordination boundary prediction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 23–32.
    Google ScholarLocate open access versionFindings
  • Atticus Geiger, Ignacio Cases, Lauri Karttunen, and Christopher Potts. 2018. Stress-testing neural models of natural language inference with multiply-quantified sentences. arXiv preprint arXiv:1810.13033.
    Findings
  • Lila R Gleitman. 1965. Coordinating conjunctions in English. Language, 41(2):260–293.
    Google ScholarLocate open access versionFindings
  • Max Glockner, Vered Shwartz, and Yoav Goldberg. 2018. Breaking NLI systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–655.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A Smith. 2018. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112.
    Google ScholarLocate open access versionFindings
  • Jack Hoeksema. 1988. The semantics of non-boolean “and”. Journal of Semantics, 6(1):19–40.
    Google ScholarLocate open access versionFindings
  • Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, and Adina Williams. 2020. Are natural language inference models imppressive? learning implicature and presupposition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031.
    Google ScholarLocate open access versionFindings
  • Divyansh Kaushik, Eduard Hovy, and Zachary Lipton. 2020. Learning the difference that makes a difference with counterfactually-augmented data. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Edward L Keenan and Leonard M Faltz. 2012. Boolean semantics for natural language, volume 23. Springer Science & Business Media.
    Google ScholarLocate open access versionFindings
  • Manfred Krifka. 1990. Boolean and non-boolean ‘and’. In Papers from the second symposium on Logic and Language, pages 161–188.
    Google ScholarLocate open access versionFindings
  • Nelson F Liu, Roy Schwartz, and Noah A Smith. 2019a. Inoculation by fine-tuning: A method for analyzing challenge datasets. In NAACL.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics.
    Google ScholarFindings
  • Gerald J Massey. 1976.
    Google ScholarFindings
  • Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448.
    Google ScholarLocate open access versionFindings
  • Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. 2018. Stress test evaluation for natural language inference. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2340–2353.
    Google ScholarLocate open access versionFindings
  • Yixin Nie, Yicheng Wang, and Mohit Bansal. 2019. Analyzing compositionality-sensitivity of NLI models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6867–6874.
    Google ScholarLocate open access versionFindings
  • Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2020. Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Hiroki Ouchi, Hiroyuki Shindo, and Yuji Matsumoto. 2018. A span selection model for semantic role labeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Deric Pang, Lucy H Lin, and Noah A Smith. 2019. Improving natural language inference with a pretrained parser. arXiv preprint arXiv:1909.08217.
    Findings
  • Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, and Benjamin Van Durme. 2018a. Collecting diverse natural language inference problems for sentence representation evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 67–81.
    Google ScholarLocate open access versionFindings
  • Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Benjamin Van Durme. 2018b. Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 180–191.
    Google ScholarLocate open access versionFindings
  • Abhilasha Ravichander, Aakanksha Naik, Carolyn Rose, and Eduard Hovy. 2019. EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 349–361.
    Google ScholarLocate open access versionFindings
  • Kyle Richardson, Hai Hu, Lawrence S Moss, and Ashish Sabharwal. 2020. Probing natural language inference models through semantic fragments. In Proceedings of the AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Alexis Ross and Ellie Pavlick. 2019. How well do NLI models capture verb veridicality? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
    Google ScholarLocate open access versionFindings
  • Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel. 2018. Behavior analysis of NLI models: Uncovering the influence of three factors on robustness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1975–1985, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Peng Shi and Jimmy Lin. 2019. Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255.
    Findings
  • Hiroki Teranishi, Hiroyuki Shindo, and Yuji Matsumoto. 2019. Decomposed local models for coordinate structure parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3394–3403.
    Google ScholarLocate open access versionFindings
  • Yicheng Wang and Mohit Bansal. 2018. Robust machine comprehension models via adversarial training. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 575–581.
    Google ScholarLocate open access versionFindings
  • Aaron Steven White, Pushpendre Rastogi, Kevin Duh, and Benjamin Van Durme. 2017. Inference is everything: Recasting semantic resources into a unified evaluation framework. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 996–1005.
    Google ScholarLocate open access versionFindings
  • Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
    Findings
  • Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, and Kentaro Inui. 2020. Do neural models learn systematicity of monotonicity inference in natural language? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Swarnadeep Saha and Mausam. 2018. Open information extraction from conjunctive sentences. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2288–2299.
    Google ScholarLocate open access versionFindings
  • Xiang Zhou, Yixin Nie, Hao Tan, and Mohit Bansal. 2020. The curse of performance instability in analysis datasets: Consequences, source, and suggestions. In EMNLP.
    Google ScholarFindings
  • Gilbert was the freshman foot- Gilbert was the freshman neutral ball coach of Franklin and football coach of Franklin Marshall College in 1938. College in 1938.
    Google ScholarFindings
  • It premiered on 27 June 2016 It premiered on 28 June contradiction If it premiered on 27 June, it cannot preand airs Mon-Fri 10-11pm 2016 and airs Mon-Fri 10-
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科