AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
For the Short Modifier experiment we find no few-shot learning for the n-gram models, but moderate fewshot learning for all neural models tested

Structural Supervision Improves Few Shot Learning and Syntactic Generalization in Neural Language Models

EMNLP 2020, pp.4640-4652, (2020)

被引用1|浏览199
下载 PDF 全文
引用
微博一下

摘要

Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. Fi...更多

代码

数据

0
简介
  • Recurrent Neural Network language models (Elman, 1990; Hochreiter and Schmidhuber, 1997) have been shown to learn many aspects of natural language syntax including a number of long-

    Miguel conducted this work while at IBM Research Scripts and data for this paper can be found online at https://github.com/wilcoxeg/fsl invar distance dependencies and representations of incremental syntactic state (Marvin and Linzen, 2018; Wilcox et al, 2018; Futrell et al, 2018).
  • Previous studies have not investigated the relationship between a token’s frequency in the training corpus and syntactic properties models learn about it.
  • Many semantic-syntactic rules that govern word co-occurrence in one form, such as a verb’s argument structure relationships, hold uniformly across transformations.
  • It remains an open question whether models learn grammatical rules invariant to their surface realization, a property the authors call syntactic invariance
重点内容
  • Recurrent Neural Network language models (Elman, 1990; Hochreiter and Schmidhuber, 1997) have been shown to learn many aspects of natural language syntax including a number of long-

    Miguel conducted this work while at IBM Research Scripts and data for this paper can be found online at https://github.com/wilcoxeg/fsl invar distance dependencies and representations of incremental syntactic state (Marvin and Linzen, 2018; Wilcox et al, 2018; Futrell et al, 2018)
  • We assess neural models’ ability to make robust syntactic generalizations about a token’s nominal number or verbal argument structure based on minimal exposure with the token during training
  • People apply the same properties across different constructions, meaning that their representations of the syntactic features of a word are in some sense invariant to the grammatical context of that word
  • We have tested the few-shot learning capabilities of neural language models, as well as whether these models can learn grammatical representations that are invariant to syntactic transformation
  • We addressed neural models’ ability to learn nominal number, introducing a novel testing paradigm that leveraged polar questions to assess subject/verb number agreement learning in syntactically transformed settings
方法
  • In order to assess the learning outcomes of neural LMs, the authors adopt the Psycholinguistic Assessment Paradigm (Linzen et al, 2016; Futrell et al, 2018).
  • In this paradigm models are exposed to sentences that reveal the syntactic generalizations learned during training.
结果
  • The authors find a significant effect of structural supervision, with both the ActionLSTM and RNNG outperforming the LSTM model (p < 0.05 and p < 0.001 respectively).
  • Results are similar for the Transformed Modifier experiment
  • In this case the ngram model is at 0% accuracy for all buckets; this is because it assigns equal probability to the critical region in each condition, which we count as a “failure.” The ActionLSTM displays moderate generalization (4/8 buckets) and the RNNG and LSTM stronger generalization (7/8 and 8/8 buckets respectively).
  • Turning to the effects of structural supervision: The authors find that the RNNG and the ActionLSTM generally outperform the LSTM (p < 0.001 for all three, except RNNG/Short Modifier which is not significant)
结论
  • The authors have tested the few-shot learning capabilities of neural language models, as well as whether these models can learn grammatical representations that are invariant to syntactic transformation.
  • The authors addressed neural models’ ability to learn nominal number, introducing a novel testing paradigm that leveraged polar questions to assess subject/verb number agreement learning in syntactically transformed settings.
  • The authors Test n-gram Number
表格
  • Table1: Left columns: Few shot learning outcomes with the results from our tests of syntactic invariance in the bottom quadrant. Colors correspond to the proportion of exposure buckets for which each model achieved accuracy scores significantly above chance, colored by tertiles. Right columns indicate whether the two structurally supervised models outperform the LSTM for each test, where *s indicate the significance level from our statistical tests and !s indicate significantly worse performance than the LSTM
Download tables as Excel
相关工作
  • Bayesian models of word learning have shown successes in acquiring proper syntactic generalizations from minimal exposure (Tenenbaum and Xu, 2000; Wang et al, 2017), however it is not clear how well neural network models would exhibit these rapid generalizations. Comparing between neural network architectures, recent work has shown that models enhanced with explicit structural supervision during training produce more humanlike syntactic generalizations (Kuncoro et al, 2017, 2018; Wilcox et al, 2019), but it remains untested whether such supervision helps learn properties of tokens that occur rarely during training.

    Previous studies have found that Artificial Neural Networks (ANNs) are capable of learning some argument structure paradigms and make correct predictions across multiple frames (Kann et al, 2018), however these capabilities remain untested for incremental language models. Much has been written about the ability of ANNs to learn number agreement (Linzen et al, 2016; Gulordava et al, 2018; Giulianelli et al, 2018), including their ability to maintain the dependency across different types of intervening material (Marvin and Linzen, 2018) and with coordinated noun phrases (An et al, 2019). Hu et al (2020) find that model architecture, rather than training data size, may contribute most to performance on number agreement and related tasks. Focusing on RNN models, Lakretz et al (2019) find evidence that number agreement is tracked by specific “number” units that work in concert with units that carry more general syntactic information like tree depth. Jumelet et al (2019) argue that when learning dependencies RNNs acquire a default form (which they postulate to be singular and masculine), and predicting a non-default form requires explicit contrary evidence. Our results support their hypothesis. Models are more accurate with singular nouns and transitive verbs seen only a few times in training, behavior that indicates these forms are expected when evidence is sparse.
基金
  • This work was supported by the MIT-IBM Watson AI Lab
引用论文
  • Aixiu An, Peng Qian, Ethan Wilcox, and Roger Levy. 2019. Representation of constituents in neural language models: Coordination phrase as a case study. arXiv preprint arXiv:1909.04625.
    Findings
  • R Harald Baayen, Richard Piepenbrock, and Leon Gulikers. 1995. The celex lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania.
    Google ScholarFindings
  • Eugene Charniak et al. 2016. Parsing as language modeling. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Noam Chomsky. 1957. Syntactic structures. Walter de Gruyter.
    Google ScholarFindings
  • Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. Recurrent neural network grammars. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Jeffrey L Elman. 1990. Finding structure in time. Cognitive science, 14(2):179–211.
    Google ScholarLocate open access versionFindings
  • Richard Futrell, Ethan Wilcox, Takashi Morita, and Roger Levy. 2018. RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency. arXiv preprint arXiv:1809.01329.
    Findings
  • Mario Giulianelli, Jack Harding, Florian Mohnert, Dieuwke Hupkes, and Willem Zuidema. 201Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information. arXiv preprint arXiv:1808.08079.
    Findings
  • Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni. 2018. Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, and Roger P Levy. 2020. A systematic assessment of syntactic generalization in neural language models. arXiv preprint arXiv:2005.03692.
    Findings
  • Jaap Jumelet, Willem Zuidema, and Dieuwke Hupkes. 2019. Analysing neural language models: Contextual decomposition reveals default reasoning in number and gender assignment. arXiv preprint arXiv:1909.08975.
    Findings
  • Katharina Kann, Alex Warstadt, Adina Williams, and Samuel R Bowman. 2018. Verb argument structure alternations in word and sentence embeddings. arXiv preprint arXiv:1811.10773.
    Findings
  • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, and Noah A. Smith. 2017. What do recurrent neural network grammars learn about syntax? In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, and Phil Blunsom. 2018. Lstms can learn syntax-sensitive dependencies well, but modeling structure makes them better. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1426–1436.
    Google ScholarLocate open access versionFindings
  • Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, and Marco Baroni. 2019. The emergence of number and syntax units in lstm language models. arXiv preprint arXiv:1903.07435.
    Findings
  • Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521– 535.
    Google ScholarLocate open access versionFindings
  • Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313–330.
    Google ScholarLocate open access versionFindings
  • Rebecca Marvin and Tal Linzen. 2018. Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Sebastian Schuster and Christopher D Manning. 2016. Enhanced english universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the 10th International Conference on Language Resources and Evaluation.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
    Findings
  • Mitchell Stern, Daniel Fried, and Dan Klein. 2017. Effective inference for generative neural parsing. arXiv preprint arXiv:1707.08976.
    Findings
  • Andreas Stolcke. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing.
    Google ScholarLocate open access versionFindings
  • Joshua B Tenenbaum and Fei Xu. 2000. Word learning as bayesian inference. In Proceedings of the Annual Meeting of the Cognitive Science Society.
    Google ScholarLocate open access versionFindings
  • Marten Van Schijndel and Tal Linzen. 2018. A neural model of adaptation in reading. arXiv preprint arXiv:1808.09930.
    Findings
  • Su Wang, Stephen Roller, and Katrin Erk. 2017. Distributional modeling on a diet: One-shot word learning from text only. In Proceedings of the 8th International Joint Conference on Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Ethan Wilcox, Roger Levy, Takashi Morita, and Richard Futrell. 2018. What do RNN language models learn about filler-gap dependencies? In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.
    Google ScholarLocate open access versionFindings
  • Ethan Wilcox, Peng Qian, Richard Futrell, Miguel Ballesteros, and Roger Levy. 2019. Structural supervision improves learning of non-local grammatical dependencies. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • George Kingsley Zipf. 1949. Human behavior and the principle of least effort. addison-wesley press.
    Google ScholarFindings
作者
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科