AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
End-To-End Learning Of Semantic Role Labeling Using Recurrent Neural Networks
PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH ..., (2015): 1127-1137
- Semantic role labeling (SRL) is a form of shallow semantic parsing whose goal is to discover the predicate-argument structure of each predicate in a given input sentence.
- For each target verb all the constituents in the sentence which fill a semantic role of the verb have to be recognized.
- Whether an argument is related to the predicate is determined; the detail relation type was decided(Palmer et al, 2010)
- Semantic role labeling (SRL) is a form of shallow semantic parsing whose goal is to discover the predicate-argument structure of each predicate in a given input sentence
- Semantic role labeling is useful as an intermediate step in a wide range of natural language processing (NLP) tasks, such as information extraction (Bastianelli et al, 2013), automatic document categorization (Persson et al, 2009) and questionanswering (Dan and Lapata, 2007; Surdeanu et al, 2003; Moschitti et al, 2003)
- We propose an end-to-end system using deep bi-directional long short-term memory (DB-long short-term memory) model to address the above difficulties
- We propose an end-to-end system based on recurrent topology
- We investigate a traditional natural language processing problem Semantic role labeling with DB-long short-term memory network
- With more sophisticatedly designed network and training technique based on long short-term memory, such as the attempt to integrate the parse tree concept into long short-term memory framework (Tai et al, 2015), we believe the better performance can be achieved
- The authors mainly evaluated and analyzed the system on the commonly used CoNLL-2005 shared task data set and the conclusions are validated on CoNLL-2012 shared task. 4.1 Data set
CoNLL-2005 data set takes section 2-21 of Wall Street Journal (WSJ) data as training set, and section 24 as development set.
- The test set consists of section 23 of WSJ concatenated with 3 sections from Brown corpus (Carreras and Marquez, 2005).
- The description and separation of train, development and test data set can be found in (Pradhan et al, 2013).
- In this part, the authors will analyze the performance of two different networks, the CNN and LSTM network.
- In order to have good understanding of the contribution from each modeling decision, the authors started from a simple model and add more units step by step
- The authors' model achieves F1 score of 81.07 on CoNLL-2005 shared task and 81.27 on CoNLL-2012 shared task, both outperforming the previous systems based on parsing results and feature engineering, which heavily rely on the linguistic knowledge from expert
- The authors investigate a traditional NLP problem SRL with DB-LSTM network
- With this model, the authors are able to bypass the traditional steps for extracting the intermediate NLP features such as POS and syntactic parsing and avoid human engineering the feature templates.
- The authors achieve strong ability of learning semantic rules without worrying about over-fitting even on such limited training set.
- It outperforms the convolution method with large context length.
- With more sophisticatedly designed network and training technique based on LSTM, such as the attempt to integrate the parse tree concept into LSTM framework (Tai et al, 2015), the authors believe the better performance can be achieved
- Table1: An example sequence with 4 input features: argument, predicate, predicate context (context length is 3) , region mark. “IOB” tagging scheme is used (Collobert et al, 2011)
- Table2: F1 of CNN method on development set and test set of CoNLL-2005 data set
- Table3: F1 with LSTM method on development set and test set of CoNLL-2005 data set and CoNLL-2012 data set. Emb: the type of embedding. d: the number of LSTM layers. ctx-p: predicate context length. mr: region mark feature. h: hidden layer size
- Table4: Comparison with previous methods
- Table5: F1 on each sub sets and classes (CoNLL2005). (We remove the classes with low statistics.)
- People solve SRL problems in two major ways. The first one follows the traditional spirit widely used in NLP basic problems. A linear classifier is employed with feature templates. Most efforts focus on how to extract the feature templates that can best describe the text properties from training corpus. One of the most important features is from syntactic parsing, although syntactic parsing is also considered as a difficult problem. Thus system combination appear to be the general solution.
- G. Hinton A. Graves, A. Mohamed. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013.
- Emanuele Bastianelli, Giuseppe Castellucci, Danilo Croce, and Roberto Basili. 2013. Textual inference and meaning representation in human robot interaction. In Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora, pages 65–69.
- Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166.
- Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, March.
- Yoshua Bengio, Holger Schwenk, Jean-Sbastien Sencal, Frderic Morin, and Jean-Luc Gauvain. 2006. Neural probabilistic language models. In Innovations in Machine Learning, volume 194 of Studies in Fuzziness and Soft Computing, pages 137–186. Springer Berlin Heidelberg.
- Xavier Carreras and Lluıs Marquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pages 152–164, Ann Arbor, Michigan, June. Association for Computational Linguistics.
- Eugene Charniak and Mark Johnson. 2005. Coarseto-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 173–180, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Eugene Charniak. 2000. A maximum-entropyinspired parser. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, NAACL 2000, pages 132–139, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Michael Collins. 2003. Head-driven statistical models for natural language parsing. Comput. Linguist., 29(4):589–637, December.
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 160–167, New York, NY, USA. ACM.
- 20Natural language processing (almost) from scratch. Journal of Marchine Learning Research, 12:2493–2537, November.
- Shen Dan and Mirella Lapata. 2007. Using semantic roles to improve question answering. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL).
- Alex Graves, Marcus Liwicki, Santiago Fernandez, Roman Bertolami, Horst Bunke, and Jurgen Schmidhuber. 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5):855–868.
- Alex Graves, Greg Wayne, and Ivo Danihelka. 20Neural turing machines. arXiv:1410.5401.
- S. Hochreiter and J. Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1746–1751.
- Peter Koomen, Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2005. Generalized inference with multiple semantic role labeling systems. In Proceedings of the 9th Conference on Computational Natural Language Learning, CONLL ’05, pages 181–184, Stroudsburg, PA, USA. Association for Computational Linguistics.
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 8th International Conference on Machine Learning, ICML ’01, pages 282–289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
- Yann Lecun, Lon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of phrases and their compositionality. In Advances on Neural Information Processing Systems.
- Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in Neural Information Processing Systems, pages 2265–2273.
- Alessandro Moschitti, Paul Morarescu, and Sanda M. Harabagiu. 2003. Open domain information extraction via automatic semantic labeling. In FLAIRS Conference’03, pages 397–401.
- Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling. Synthesis Lectures on Human Language Technology Series. Morgan and Claypool.
- Jacob Persson, Richard Johansson, and Pierre Nugues. 2009. Text categorization using predicatecargument structures. In Proceedings of NODALIDA, pages 142–149.
- Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Martin, and Daniel Jurafsky. 2005. Semantic role chunking combining complementary syntactic views. In Proceedings of the 9th Conference on Computational Natural Language Learning, CONLL ’05, pages 217–220, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152, Sofia, Bulgaria, August. Association for Computational Linguistics.
- V. Punyakanok, D. Roth, and W. Yih. 2008a. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2).
- Vasin Punyakanok, Dan Roth, and Wen tau Yih. 2008b. The importance of syntactic parsing and inference in semantic role labeling. Computational linguistics, 6(9).
- M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45:2673–2681.
- Mihai Surdeanu, Sanda Harabagiu, John Williams, and Paul Aarseth. 2003. Using predicate-argument structures for information extraction. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 8–15, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Mihai Surdeanu, Lluıs Marquez, Xavier Carreras, and Pere R. Comas. 2007. Combination strategies for semantic role labeling. Journal of Artificial Intelligence Research, 29:105–151.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances on Neural Information Processing Systems.
- Oscar Tackstrom, Kuzman Ganchev, and Dipanjan Das. 2015. Efficient inference and structured learning for semantic role labeling. Transactions of the Association for Computational Linguistics, 3:29–41.
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53st Annual Meeting on Association for Computational Linguistics, ACL ’15, Stroudsburg, PA, USA. Association for Computational Linguistics.
- Kristina Toutanova, Aria Haghighi, and Christopher D. Manning. 2008. A global joint model for semantic role labeling. Computational Linguistics, 34:161– 191.
- Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2014. Grammar as a foreign language. arXiv:1412.7449.
- Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory networks. arXiv:1410.3916.
- Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. 2015. Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv:1502.05698.
- Mo Yu, Matthew Gormley, and Mark Dredze. 2014. Factor-based compositional embedding models. In Advances in Neural Information Processing Systems Workshop on Learning Semantics.
- Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arXiv:1502.01710.