AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We argue that textual entailment matters when the target NLP task has insufficient annotations; in this way, some NLP tasks that share the same inference pattern and annotations are insufficient to build a task-specific model can be handled by a unified entailment system

Universal Natural Language Processing with Limited Annotations: Try Few shot Textual Entailment as a Start

EMNLP 2020, pp.8229-8239, (2020)

被引用6|浏览271
下载 PDF 全文
引用
微博一下

摘要

A standard way to address different NLP problems is by first constructing a problem-specific dataset, then building a model to fit this dataset. To build the ultimate artificial intelligence, we desire a single machine that can handle diverse new problems, for which task-specific annotations are limited. We bring up textual entailment as ...更多

代码

数据

0
简介
  • 1“Universal NLP” here means using a single machine to address diverse NLP problems.
  • This is different from using the same machine learning algorithm such as convolution nets to solve tasks because the latter still results in task-specific models which can not solve other tasks.
  • A reasonable attempt is to map diverse NLP tasks into a common learning problem—solving this common problem equals to solving any downstream NLP tasks, even some tasks that are new or have insufficient annotations
重点内容
  • Nowadays, the whole NLP journey has been broken down into innumerable sub-tasks
  • We argue that textual entailment matters when the target NLP task has insufficient annotations; in this way, some NLP tasks that share the same inference pattern and annotations are insufficient to build a task-specific model can be handled by a unified entailment system
  • We reported performance of dealing with open entailment and NLP tasks by entailment approach always. We may have another question: for any NLP task, is that better to reformulate it as textual entailment? In this subsection, we compare textual entailment with other popular systems in modeling the coreference task which usually is not modeled in an entailment framework
  • We feed each instance in the GAP dataset into RoBERTa which will generate a representation for each token in the instance
  • To obtain representations for the pronoun and an entity candidate, we sum up the representations of all tokens belonging to the pronoun or the entity string
  • We studied how to build a textual entailment system that can work in open domains given only a couple of examples, and studied the common patterns in a variety of NLP tasks in which textual entailment can be used as a unified solver
方法
  • 3.1 Problem formulation

    Provided the large-scale generic textual entailment dataset MNLI (Williams et al, 2018) and a few examples from a target domain or a target task, the authors build an entailment predictor that can work well in the target domain/task even if only a few examples are available.

    The inputs include: MNLI, the example set (i.e., k examples for each type in {“entailment”, “nonentailment”} or {“entailment”, “neutral”, “contradiction”} if applicable).
  • The output is an entailment classifier, predicting a label for each instance in the new domain/task.
  • Please note that the authors need to convert those examples into labeled entailment instances if the target task is not a standard entailment problem.
  • The entailment-style outputs can be converted to the prediction format required by the target tasks, as introduced in Section 4.2.
  • The authors refer to MNLI as S, and the new domain or task as T.
  • Before launching the introduction of UFO-ENTAIL, the authors first give a brief description: UFO-ENTAIL, shown in Figure
结果
  • Comparing UFO-ENTAIL with the typical metric-based meta learning approach: prototypical networks.
  • Prototypical network is worse than STILTS on the two entailment benchmarks while mostly outperforming STILTS slightly on QA and coreference tasks.
  • Prototypical network is essentially a nearest neighbor algorithm (Yin, 2020) pretrained on S only.
  • A testing example in T searches for its prediction by comparing with the T -specific class representations constructed by the k examples.
  • A pretrained nearest neighbor algorithm does not necessarily work well if S and T are too distinct
结论
  • In Table 1, the authors reported performance of dealing with open entailment and NLP tasks by entailment approach always.
  • The authors feed each instance in the GAP dataset into RoBERTa which will generate a representation for each token in the instance.
  • The authors do binary classification for each pair
  • The authors compare this system with the entailment approach (i.e., “train on target data”) when using different sizes of training set: [10%, 20%, · · · , 100%].
  • The result for each percentage is the average of three runs with different seeds
表格
  • Table1: Applying UFO-ENTAIL to two entailment benchmarks (RTE and SciTail) and two other NLP tasks (question answering (QA) and coreference resolution (Coref.)), each providing k examples (k = {1, 3, 5, 10}). Numbers for “STILTS (SOTA)” are upperbound performance while using full labeled data; bold numbers are our top numbers when the few-shot hyperparamter k <= 10
Download tables as Excel
相关工作
  • Textual Entailment. Textual entailment was first studied in Dagan et al (2005) and the main focus in the early stages was to study lexical and some syntactic features. In the past few years, the research on textual entailment has been driven by the creation of large-scale datasets, such as SNLI (Bowman et al, 2015), science domain SciTail (Khot et al, 2018), and multi-genre MNLI (Williams et al, 2018). Representative work includes the first attentive recurrent neural network (Rocktaschel et al, 2016) and its followers (Wang and Jiang, 2016; Wang et al, 2017), as well as the attentive convolutional networks such as attentive pooling (dos Santos et al, 2016) and attentive convolution (Yin and Schutze, 2018), and self-attentive large-scale language models like BERT (Devlin et al, 2019) and RoBERTa (Liu et al, 2019). All these studies result in systems that are overly tailored to the datasets.

    Our work differs in that we care more about fewshot applications of textual entailment, assuming that a new domain or an NLP task is not provided with rich annotated data.
基金
  • When using all the GAP training data, both entailment and the (pronoun, entity) classification system reach pretty similar results; (ii) When the training size is below 30%, the non-entailment approach shows better performance
引用论文
  • Trapit Bansal, Rishikesh Jha, and Andrew McCallum. 2019. Learning to few-shot learn across diverse natural language classification tasks. CoRR, 1911.03863.
    Findings
  • Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In EMNLP, pages 632–642.
    Google ScholarLocate open access versionFindings
  • Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. Boolq: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of NAACL-HLT, pages 2924–2936.
    Google ScholarLocate open access versionFindings
  • Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, First PASCAL Machine Learning Challenges Workshop, pages 177–190.
    Google ScholarLocate open access versionFindings
  • Hal Daume III. 2007. Frustratingly easy domain adaptation. In ACL, pages 256–263.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, and Noah A. Smith. 2018. Annotation artifacts in natural language inference data. In NAACL, pages 107–112.
    Google ScholarLocate open access versionFindings
  • Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 201FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In EMNLP, pages 4803–4809.
    Google ScholarLocate open access versionFindings
  • Bingyi Kang and Jiashi Feng. 2018. Transferable meta learning across domains. In UAI, pages 177–187.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Unifying question answering and text classification via span extraction. CoRR, abs/1904.09286.
    Findings
  • Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. SciTaiL: A textual entailment dataset from science question answering. In AAAI, pages 5189–5197.
    Google ScholarLocate open access versionFindings
  • Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
    Findings
  • Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. CoRR, abs/1806.08730.
    Findings
  • Timothy A. Miller. 2019. Simplified neural unsupervised domain adaptation. In NAACL-HLT, pages 414–419.
    Google ScholarLocate open access versionFindings
  • Jason Phang, Thibault Fevry, and Samuel R. Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. CoRR, abs/1811.01088.
    Findings
  • Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, and Benjamin Van Durme. 2018. Collecting diverse natural language inference problems for sentence representation evaluation. In EMNLP, pages 67–81.
    Google ScholarLocate open access versionFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683.
    Findings
  • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. 2018. Metalearning for semi-supervised few-shot classification. In ICLR.
    Google ScholarFindings
  • Matthew Richardson, Christopher J. C. Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In EMNLP, pages 193–203.
    Google ScholarLocate open access versionFindings
  • Tim Rocktaschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2016. Reasoning about entailment with neural attention. In ICLR.
    Google ScholarFindings
  • Cıcero Nogueira dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. 2016. Attentive pooling networks. CoRR, abs/1602.03609.
    Findings
  • Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. Social iqa: Commonsense reasoning about social interactions. In Proceedings of EMNLP-IJCNLP, pages 4462–4472.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical networks for few-shot learning. In NeurIPS, pages 4077–4087.
    Google ScholarLocate open access versionFindings
  • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In CVPR, pages 1199–1208.
    Google ScholarLocate open access versionFindings
  • Adam Trischler, Zheng Ye, Xingdi Yuan, Jing He, and Philip Bachman. 2016. A parallel-hierarchical model for machine comprehension on sparse data. In ACL.
    Google ScholarFindings
  • Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning. In NeurIPS, pages 3630–3638.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR.
    Google ScholarFindings
  • Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In NAACL, pages 1442–1451.
    Google ScholarLocate open access versionFindings
  • Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In IJCAI, pages 4144–4150.
    Google ScholarLocate open access versionFindings
  • Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. TACL, 6:605–617.
    Google ScholarLocate open access versionFindings
  • Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL-HLT, pages 1112–1122.
    Google ScholarLocate open access versionFindings
  • Wenpeng Yin. 2020. Meta-learning for few-shot natural language processing: A survey. CoRR, abs/2007.09604.
    Findings
  • Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3905–3914.
    Google ScholarLocate open access versionFindings
  • Wenpeng Yin and Hinrich Schutze. 2018. Attentive convolution: Equipping cnns with rnn-style attention mechanisms. TACL, 6:687–702.
    Google ScholarLocate open access versionFindings
  • Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, and Bowen Zhou. 2018. Diverse few-shot text classification with multiple metrics. In NAACL-HLT, pages 1206–1215.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科