KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Andrea Santilli
Andrea Santilli
Leonardo Ranaldi
Leonardo Ranaldi
Dario Onorati
Dario Onorati
Pierfrancesco Tommasino
Pierfrancesco Tommasino

EMNLP 2020, 2020.

被引用0|引用|浏览29
关键词
Long Short-Term Memoriesneural networksymbolic syntacticuniversal sentence embeddinglayer-wise relevance propagation更多(7+)
微博一下
We introduced KERMIT to show that these interpretations can be effectively used in combination with universal sentence embeddings produced from scratch

摘要

Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality in downstream tasks due to the success of large-scale textual representation learners. In this paper, we propose KERMIT (Kernelinspired Encoder with Recursive Mechanism for Interpretable Trees) to embed s...更多

代码

数据

0
简介
  • Universal sentence embeddings (Conneau et al, 2018), which are task-independent, distributed sentence representations, are redesigning the way linguistic models in natural language processing are defined.
  • Socher et al (2011) have defined the notion of Recursive Neural Networks (RecNN) that are Recurrent Neural Networks applied to binary trees
  • These RecNNs have been used to parse sentences and not to include preexisting syntax in a final task (Socher et al, 2011).
  • Munkhdalai and Yu (2017) have specialized LSTM for binary and n-ry trees with their Neural Tree Indexers and Strubell et al (2018) have encoded syntactic information by using multi-head attention within a transformer architecture
重点内容
  • Universal sentence embeddings (Conneau et al, 2018), which are task-independent, distributed sentence representations, are redesigning the way linguistic models in natural language processing are defined
  • We propose KERMIT (Kernelinspired Encoder with Recursive Mechanism for Interpretable Trees) to embed symbolic syntactic parse trees into artificial neural networks and to visualize how syntax is used in inference
  • We investigate whether explicit universal syntactic interpretations can be used to improve state-of-the-art universal sentence embeddings and to create neural network architectures where syntax decisions are less obscure and, syntactically explainable
  • Results from the completely universal experimental setting suggest that universal syntactic interpretations complement syntax in universal sentence embeddings
  • Universal syntactic interpretations are valuable language interpretations, which have been developed in years of study
  • We introduced KERMIT to show that these interpretations can be effectively used in combination with universal sentence embeddings produced from scratch
方法
  • The authors aim to investigate whether KERMIT can be used to create neural network architectures where universal syntactic interpretations are useful: (1) to improve state-of-the-art universal sentence embeddings, especially in computationally light environments, and (2) to syntactically explain decisions.

    The rest of the section describes the experimental set-up, the quantitative experimental results of KERMIT and discusses how KERMITviz can be used to explain inferences made by neural networks over examples.

    4.1 Experimental Set-up

    This section describes the general experimental set-up of the experiments, the specific configurations adopted in the completely universal and task-specific settings, the used computational architecture and the datasets.

    The general experimental settings are described hereafter.
  • This section describes the general experimental set-up of the experiments, the specific configurations adopted in the completely universal and task-specific settings, the used computational architecture and the datasets.
  • As the experiments are text classification tasks, the decoder layer of the KERMIT+Tranformer architecture is a fully connected layer with the softmax activation function applied to the concatenation of the KERMIT output and the final [CLS] token representation of the selected transformer model.
  • The optimizer used to train the whole architecture is AdamW (Loshchilov and Hutter, 2019) with the learning rate set to 3e−5
结果
  • Results from the completely universal experimental setting suggest that universal syntactic interpretations complement syntax in universal sentence embeddings.
  • This conclusion is derived from the following observations of Table 1, which reports results in terms of the accuracy of the different models based on the different datasets.
  • Syntactic information in AGNews seems to be irrelevant as there is a small difference in results between BERTBASE, on the one side, with 82.88(±0.09) and BERTBASE-Reverse with 79.72(±0.11) and
结论
  • Universal syntactic interpretations are valuable language interpretations, which have been developed in years of study.
  • KERMITviz allows them to explain how syntactic information is used in classification decisions within networks combining KERMIT, on the one side, and BERT or XLNet on the other.
  • As KERMIT has a clear description of the used syntactic subtrees and gives the possibility of visualizing how syntactic information is exploited during inference, it opens the possibility of devising models to include explicit syntactic inference rules in the training process.
总结
  • Introduction:

    Universal sentence embeddings (Conneau et al, 2018), which are task-independent, distributed sentence representations, are redesigning the way linguistic models in natural language processing are defined.
  • Socher et al (2011) have defined the notion of Recursive Neural Networks (RecNN) that are Recurrent Neural Networks applied to binary trees
  • These RecNNs have been used to parse sentences and not to include preexisting syntax in a final task (Socher et al, 2011).
  • Munkhdalai and Yu (2017) have specialized LSTM for binary and n-ry trees with their Neural Tree Indexers and Strubell et al (2018) have encoded syntactic information by using multi-head attention within a transformer architecture
  • Objectives:

    The authors aim to investigate whether KERMIT can be used to create neural network architectures where universal syntactic interpretations are useful: (1) to improve state-of-the-art universal sentence embeddings, especially in computationally light environments, and (2) to syntactically explain decisions.
  • Methods:

    The authors aim to investigate whether KERMIT can be used to create neural network architectures where universal syntactic interpretations are useful: (1) to improve state-of-the-art universal sentence embeddings, especially in computationally light environments, and (2) to syntactically explain decisions.

    The rest of the section describes the experimental set-up, the quantitative experimental results of KERMIT and discusses how KERMITviz can be used to explain inferences made by neural networks over examples.

    4.1 Experimental Set-up

    This section describes the general experimental set-up of the experiments, the specific configurations adopted in the completely universal and task-specific settings, the used computational architecture and the datasets.

    The general experimental settings are described hereafter.
  • This section describes the general experimental set-up of the experiments, the specific configurations adopted in the completely universal and task-specific settings, the used computational architecture and the datasets.
  • As the experiments are text classification tasks, the decoder layer of the KERMIT+Tranformer architecture is a fully connected layer with the softmax activation function applied to the concatenation of the KERMIT output and the final [CLS] token representation of the selected transformer model.
  • The optimizer used to train the whole architecture is AdamW (Loshchilov and Hutter, 2019) with the learning rate set to 3e−5
  • Results:

    Results from the completely universal experimental setting suggest that universal syntactic interpretations complement syntax in universal sentence embeddings.
  • This conclusion is derived from the following observations of Table 1, which reports results in terms of the accuracy of the different models based on the different datasets.
  • Syntactic information in AGNews seems to be irrelevant as there is a small difference in results between BERTBASE, on the one side, with 82.88(±0.09) and BERTBASE-Reverse with 79.72(±0.11) and
  • Conclusion:

    Universal syntactic interpretations are valuable language interpretations, which have been developed in years of study.
  • KERMITviz allows them to explain how syntactic information is used in classification decisions within networks combining KERMIT, on the one side, and BERT or XLNet on the other.
  • As KERMIT has a clear description of the used syntactic subtrees and gives the possibility of visualizing how syntactic information is exploited during inference, it opens the possibility of devising models to include explicit syntactic inference rules in the training process.
表格
  • Table1: Universal Setting - Average accuracy and standard deviation on four text classification tasks. Results derive from 5 runs and and indicate a statistically significant difference between two results with a 95% confidence level with the sign test
Download tables as Excel
基金
  • In this paper, we investigate whether explicit universal syntactic interpretations can be used to improve state-of-the-art universal sentence embeddings and to create neural network architectures where syntax decisions are less obscure and, thus, syntactically explainable
研究对象与分析
cases: 3
This may be justified as both XLNet and BERTBASE are trained on Wikipedia, thus universal sentence embeddings are already adapted to the specific dataset. Thirdly, in the three cases where syntactic information is relevant (Yelp Review, Yelp Polarity and DBPedia), the complete KERMIT+Transformer outperforms the model that is based only on the related Transformer, and the difference is statistically significant: 53.72(±0.14) vs. 46.26(±0.13) in Yelp Review, 94.51(±0.05) vs. 92.46(±0.09) in DBPedia and 88.99(±0.17) vs. 81.99(±0.15). in Yelp Polarity for XLNet and 52.02(±0.06) vs. 42.90(±0.05) in Yelp Review, 97.73(±0.16) vs. 97.11(±0.27) in DBPedia and 87.58(±0.17) vs. 79.21(±0.50) in Yelp Polarity for BERTBASE

引用论文
  • Abien Fred Agarap. 2018. Deep Learning using Rectified Linear Units (ReLU). CoRR, abs/1803.0.
    Google ScholarFindings
  • Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus Robert Muller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10(7):1–46.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pages 1–15.
    Google ScholarLocate open access versionFindings
  • Marco Baroni and Roberto Zamparelli. 2010. Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1183–1193, Cambridge, MA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and James Glass. 2019. Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics, 7:49–72.
    Google ScholarLocate open access versionFindings
  • Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder for English. EMNLP 2018 - Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Proceedings, pages 169–174.
    Google ScholarLocate open access versionFindings
  • David J Chalmers. 1992. Syntactic Transformations on Distributed Representations. In Noel Sharkey, editor, Connectionist Natural Language Processing: Readings from Connection Science, pages 46–55. Springer Netherlands, Dordrecht.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using {RNN} Encoder{– }Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), pages 1724–1734, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jihun Choi, Kang Min Yoo, and Sang Goo Lee. 2018. Learning to compose task-specific tree structures. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pages 5094–5101.
    Google ScholarLocate open access versionFindings
  • Stephen Clark and Stephen Pulman. 2007. Combining Symbolic and Distributional Models of Meaning. In Proceedings of the AAAI Spring Symposium on Quantum Interaction, Stanford, CA, 2007, pages 52– 55.
    Google ScholarLocate open access versionFindings
  • Michael Collins and Nigel Duffy. 2002. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In Proceedings of {ACL}02.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Douwe Kiela, Holger Schwenk, Loıc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In EMNLP 2017 Conference on Empirical Methods in Natural Language Processing, Proceedings, pages 670–680.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, German Kruszewski, Guillaume Lample, Loıc Barrault, and Marco Baroni. 2018. What you can cram into a single
    amp;!#* vector: Probing sentence embeddings for linguistic properties. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1:2126–2136.
    Google ScholarLocate open access versionFindings
  • Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
    Google ScholarFindings
  • Danilo Croce, Daniele Rossini, and Roberto Basili. 2019a. Auditing Deep Learning processes through Kernel-based Explanatory Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4037–4046, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Danilo Croce, Daniele Rossini, and Roberto Basili. 2019b. Neural embeddings: Accurate and readable inferences based on semantic kernels. Natural Language Engineering, 25(4):519–541.
    Google ScholarLocate open access versionFindings
  • Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL ’04, pages 423–es, Morristown, NJ, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. CoRR, abs/1901.0.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. {BERT:} Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs/1810.0.
    Google ScholarLocate open access versionFindings
  • Allyson Ettinger. 2019. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models.
    Google ScholarFindings
  • Jerry A. Fodor and Zenon W. Pylyshyn. 1988. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71.
    Google ScholarLocate open access versionFindings
  • Zachary S L Foster, Thomas J Sharpton, and Niklaus J Grunwald. 2017. Metacoder: An {R} package for visualization and manipulation of community taxonomic diversity data. PLoS Computational Biology, 13(2).
    Google ScholarLocate open access versionFindings
  • Yoav Goldberg. 2019. Assessing BERT’s Syntactic Abilities.
    Google ScholarFindings
  • Christoph Goller and Andreas Kuechler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks - Conference Proceedings, volume 1, pages 347–352. IEEE.
    Google ScholarLocate open access versionFindings
  • John Hewitt and Christopher D Manning. 2019. {A} Structural Probe for Finding Syntax in Word Representations. In Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alon Jacovi, Oren Sar Shalom, and Yoav Goldberg. 2018. Understanding Convolutional Neural Networks for Text Classification. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 56–65, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ganesh Jawahar,, Benoıt Sagot,, and Djame Seddah. 2019. What Does BERT Learn about the Structure of Language? In Proceedings of the Conference of the Association for Computational Linguistics, pages 3651–3657. Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • W Johnson and J Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math., 26:189–206.
    Google ScholarLocate open access versionFindings
  • Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-Thought Vectors. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 3294–3302, Cambridge, MA, USA. MIT Press.
    Google ScholarLocate open access versionFindings
  • Aran Komatsuzaki. 2019. One Epoch Is All You Need. pages 1–13.
    Google ScholarFindings
  • Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the Dark Secrets of BERT.
    Google ScholarFindings
  • Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, and Phil Blunsom. 2020. Syntactic Structure Distillation Pretraining For Bidirectional Encoders.
    Google ScholarFindings
  • Zachary C. Lipton. 2016. The Mythos of Model Interpretability. ICML Workshop on Human Interpretability in Machine Learning, 61(Whi):36–43.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach. CoRR, abs/1907.1.
    Google ScholarFindings
  • Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. 7th International Conference on Learning Representations, ICLR 2019.
    Google ScholarLocate open access versionFindings
  • Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The {S}tanford {C}ore{NLP} Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60, Baltimore, Maryland. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • David Marecek and Rudolf Rosa. 2019. Extracting Syntactic Trees from Transformer Encoder SelfAttentions. pages 347–349.
    Google ScholarFindings
  • Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In Proceedings of ACL-08: HLT, pages 236–244, Columbus, Ohio. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alessandro Moschitti. 2006. Making Tree Kernels practical for Natural Language Learning. In Proceedings of EACL’06. Trento, Italy.
    Google ScholarLocate open access versionFindings
  • Tsendsuren Munkhdalai and Hong Yu. 2017. Neural Tree Indexers for Text Understanding. In Proceedings of the conference of the Association for Computational Linguistics, volume 1, pages 11–21. NIH Public Access.
    Google ScholarLocate open access versionFindings
  • Daniele Pighin and Alessandro Moschitti. 2010. On Reverse Feature Engineering of Syntactic Tree Kernels. In Conference on Natural Language Learning (CoNLL-2010), Uppsala, Sweden.
    Google ScholarLocate open access versionFindings
  • T A Plate. 1995. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623– 641.
    Google ScholarLocate open access versionFindings
  • Jordan B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46(1-2):77–105.
    Google ScholarLocate open access versionFindings
  • Andrea Santilli and Fabio Massimo Zanzotto. 2018. SyntNN at SemEval-2018 Task 2: is Syntax Useful for Emoji Prediction? Embedding Syntactic Trees in Multi Layer Perceptrons. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 477–481, New Orleans, Louisiana. Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Richard Socher, Cliff Chiung Yu Lin, Andrew Y Ng, and Christopher D Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages 129–136.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pages 1631–1642. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56):1929–1958.
    Google ScholarLocate open access versionFindings
  • Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. Linguistically-informed self-attention for semantic role labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5027–5038.
    Google ScholarLocate open access versionFindings
  • Sandeep Subramanian, Adam Trischler, Yoshua Bengio, and Christopher J Pal. 2018. Learning general purpose distributed sentence representations via large scale multitask learning. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 1556–1566, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.
    Google ScholarFindings
  • Jesse Vig. 2019. A multiscale visualization of attention in the transformer model. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of System Demonstrations, pages 37–42.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference, pages 1201–1211.
    Google ScholarLocate open access versionFindings
  • Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 11–20, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. ArXiv, abs/1910.0.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), volume abs/1906.0, pages 5754–5764.
    Google ScholarLocate open access versionFindings
  • Fabio Massimo Zanzotto. 2019. Viewpoint: Human-inthe-loop Artificial Intelligence. J. Artif. Intell. Res., 64:243–252.
    Google ScholarLocate open access versionFindings
  • Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete. 2012. Distributed tree kernels. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, volume 1, pages 193–200.
    Google ScholarLocate open access versionFindings
  • Fabio Massimo Zanzotto and Lorenzo Ferrone. 2017. Can we explain natural language inference decisions taken with neural networks? Inference rules in distributed representations. In Proceedings of the International Joint Conference on Neural Networks, volume 2017-May, pages 3680–3687.
    Google ScholarLocate open access versionFindings
  • Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference, volume 2, pages 1263–1271.
    Google ScholarLocate open access versionFindings
  • Richong Zhang, Zhiyuan Hu, Hongyu Guo, and Yongyi Mao. 2018. Syntax encoding with application in authorship attribution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2742–2753. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In C Cortes, N D Lawrence, D D Lee, M Sugiyama, and R Garnett, editors, Advances in Neural Information Processing Systems 28, pages 649–657. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Xingxing Zhang, Liang Lu, and Mirella Lapata. 2016. Top-down tree long short-term memory networks. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 Proceedings of the Conference, pages 310–320.
    Google ScholarLocate open access versionFindings
  • Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In 32nd International Conference on Machine Learning, ICML 2015, volume 2, pages 1604–1612.
    Google ScholarLocate open access versionFindings
PDF 全文
您的评分 :
0

 

标签
评论