AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose a cross-lingual Semantic role labeling model which only requires annotations in a source language and access to raw text in the form of a parallel corpus

Alignment free Cross lingual Semantic Role Labeling

EMNLP 2020, pp.3883-3894, (2020)

被引用0|浏览198
下载 PDF 全文
引用
微博一下

摘要

Cross-lingual semantic role labeling (SRL) aims at leveraging resources in a source language to minimize the effort required to construct annotations or models for a new target language. Recent approaches rely on word alignments, machine translation engines, or preprocessing tools such as parsers or taggers. We propose a cross-lingual SRL...更多

代码

数据

0
简介
  • Semantic role labeling (SRL) is the task of identifying the arguments of semantic predicates in a sentence and labeling them with a set of predefined relations (e.g., “who” did “what” to “whom,” “when,” and “where”)
  • It has emerged as an important technology for a wide spectrum of applications ranging from machine translation (Aziz et al, 2011; Marcheggiani et al, 2018) to information extraction (Christensen et al, 2011), and summarization (Khan et al, 2015).
  • Much previous work has focused on cross-lingual SRL which aims at leveraging existing resources in a source language to minimize the effort required to construct a model or annotations for a new target language
重点内容
  • Semantic role labeling (SRL) is the task of identifying the arguments of semantic predicates in a sentence and labeling them with a set of predefined relations (e.g., “who” did “what” to “whom,” “when,” and “where”)
  • Much previous work has focused on cross-lingual SRL which aims at leveraging existing resources in a source language to minimize the effort required to construct a model or annotations for a new target language
  • We propose a novel method for cross-lingual SRL which does not rely on word alignments, machine translation or pre-processing tools such as parsers or taggers
  • Our contributions can be summarizes as follows: (a) we propose a knowledge-lean model which does not rely on alignments, machine translation or sophisticated linguistic preprocessing; (b) we introduce the concept semantic role compressor which is important at filtering noisy information and can be potentially useful for other crosslingual tasks; (3) we release two manually annotated datasets which will further advance cross-lingual semantic role labeling complementing previous work (Aminian et al, 2019; Fei et al, 2020) which reports result on semi-automatically created annotations)
  • We compared our model against previous methods on the Universal Proposition Bank (UPB, v1.0; Akbik et al 2016), which is built upon the Universal Dependency Treebank (UDT, v1.4) and the Proposition Bank (PB, v3.0)
  • We note that our approach significantly outperforms previously published models on these three languages
  • In this paper we developed a cross-lingual SRL model and demonstrated it can effectively leverage unlabeled parallel data without relying on word alignments or any other external tools
方法
  • 3.1 Datasets

    The authors trained the model using English as the source language and obtained semantic role labelers in German (DE), Spanish (ES), Finish (FI), French (FR), Italian (IT), Portuguese (PT), and Chinese (ZH).
  • For English, the authors used the Proposition Bank (v3; Palmer et al 2005) and the annotations provided as part of the CoNLL-09 shared task (Hajicet al., 2009).
  • All languages in the UBP follow a unified dependency-based SRL annotation scheme.
  • In order to comply with this scheme, the authors converted argument spans in the English Proposition Bank to dependency-based arguments by labeling the syntactic head of each span
结果
  • The authors compared the model against several baselines on the UPB test set
  • These include two transfer methods: Bootstrap (Aminian et al, 2017) and CModel (Aminian et al, 2019), which perform annotation projection through parallel data and filter word alignments empirically.
  • This suggests that transferring SRL annotations between languages with similar word orders could be an easier task
结论
  • In this paper the authors developed a cross-lingual SRL model and demonstrated it can effectively leverage unlabeled parallel data without relying on word alignments or any other external tools.
  • The authors' focus has been on dependency-based SRL, the model can be adapted to span-based annotations (Carreras and Marquez, 2005; Pradhan et al, 2013).
  • In this case, the semantic role compressor could be modified to represent entire spans rather than just head words while decompression would remain unchanged.
  • The authors plan to extend the framework to semi-supervised learning, where a small number of annotations might be available in the target language
总结
  • Introduction:

    Semantic role labeling (SRL) is the task of identifying the arguments of semantic predicates in a sentence and labeling them with a set of predefined relations (e.g., “who” did “what” to “whom,” “when,” and “where”)
  • It has emerged as an important technology for a wide spectrum of applications ranging from machine translation (Aziz et al, 2011; Marcheggiani et al, 2018) to information extraction (Christensen et al, 2011), and summarization (Khan et al, 2015).
  • Much previous work has focused on cross-lingual SRL which aims at leveraging existing resources in a source language to minimize the effort required to construct a model or annotations for a new target language
  • Methods:

    3.1 Datasets

    The authors trained the model using English as the source language and obtained semantic role labelers in German (DE), Spanish (ES), Finish (FI), French (FR), Italian (IT), Portuguese (PT), and Chinese (ZH).
  • For English, the authors used the Proposition Bank (v3; Palmer et al 2005) and the annotations provided as part of the CoNLL-09 shared task (Hajicet al., 2009).
  • All languages in the UBP follow a unified dependency-based SRL annotation scheme.
  • In order to comply with this scheme, the authors converted argument spans in the English Proposition Bank to dependency-based arguments by labeling the syntactic head of each span
  • Results:

    The authors compared the model against several baselines on the UPB test set
  • These include two transfer methods: Bootstrap (Aminian et al, 2017) and CModel (Aminian et al, 2019), which perform annotation projection through parallel data and filter word alignments empirically.
  • This suggests that transferring SRL annotations between languages with similar word orders could be an easier task
  • Conclusion:

    In this paper the authors developed a cross-lingual SRL model and demonstrated it can effectively leverage unlabeled parallel data without relying on word alignments or any other external tools.
  • The authors' focus has been on dependency-based SRL, the model can be adapted to span-based annotations (Carreras and Marquez, 2005; Pradhan et al, 2013).
  • In this case, the semantic role compressor could be modified to represent entire spans rather than just head words while decompression would remain unchanged.
  • The authors plan to extend the framework to semi-supervised learning, where a small number of annotations might be available in the target language
表格
  • Table1: Annotated data used in our experiments. We show the English source annotations (left column) used for training and corresponding target annotations used for testing in various languages
  • Table2: Hyperparameter settings for input and training (first block), semantic role labeler (second block) and semantic role compressor (third block)
  • Table3: Results (F1) on UPB test sets for six languages. Results for comparison systems are taken from previous papers (<a class="ref-link" id="cAminian_et+al_2019_a" href="#rAminian_et+al_2019_a">Aminian et al, 2019</a>; <a class="ref-link" id="cFei_et+al_2020_a" href="#rFei_et+al_2020_a">Fei et al, 2020</a>)
  • Table4: Results (F1) on manually annotated test sets for German, French, and Chinese. Pairwise differences between our model and previous systems are all statistically significant (p < 0.05) using stratified shuffling (<a class="ref-link" id="cNoreen_1989_a" href="#rNoreen_1989_a">Noreen, 1989</a>)
  • Table5: Ablations on manually annotated datasets
  • Table6: Results (F1) on French and Chinese test sets grouped by gold role labels
  • Table7: Number of sentence pairs in Europarl for six languages
Download tables as Excel
相关工作
  • There has been a great deal of interest in crosslingual transfer learning for SRL (Padoand Lapata, 2009; van der Plas et al, 2011; Kozhevnikov and Titov, 2013; Tiedemann, 2015; Zhao et al, 2018; Chen et al, 2019; Aminian et al, 2019; Fei et al, 2020). The majority of previous work has focused on two types of approaches, namely annotation projection and model transfer.

    A variety of methods have been proposed to improve the quality of annotation projections due to alignment noise. These range from word and argument filtering techniques (Padoand Lapata, 2005, 2009), to learning syntax and semantics jointly (van der Plas et al, 2011), and iterative bootstrap-

    ping (Akbik et al, 2015; Aminian et al, 2017). In an attempt to reduce the reliance on supervised lexico-syntactic features for the target language, Aminian et al (2019) make use of word and character features, and filter projected annotations according to projection density. Model transfer does not require parallel corpora or word alignment tools; nevertheless, it relies on accurate features such as POS tags (McDonald et al, 2013) or syntactic parse trees (Kozhevnikov and Titov, 2013) to enhance the ability to generalize across languages. Adversarial training is commonly used to extract language-agnostic features thereby improving the performance of cross-lingual systems (Chen et al, 2019; Ahmad et al, 2019b).
基金
  • This work was supported by the European Research Council (award number 681760, “Translating Multiple Modalities into Text”)
引用论文
  • Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng. 2019a. On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2440–2452, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, KaiWei Chang, and Nanyun Peng. 2019b. Crosslingual dependency parsing with unlabeled auxiliary languages. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 372–382, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2015. Generating high quality proposition Banks for multilingual semantic role labeling. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 397–407, Beijing, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alan Akbik, Vishwajeet Kumar, and Yunyao Li. 2016. Towards semi-automatic generation of proposition Banks for low-resource languages. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 993–998, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alan Akbik and Yunyao Li. 2016. K-SRL: Instancebased learning for semantic role labeling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 599–608, Osaka, Japan. The COLING 2016 Organizing Committee.
    Google ScholarLocate open access versionFindings
  • Maryam Aminian, Mohammad Sadegh Rasooli, and Mona Diab. 2017. Transferring semantic roles using translation and syntactic information. arXiv preprint arXiv:1710.01411.
    Findings
  • Maryam Aminian, Mohammad Sadegh Rasooli, and Mona Diab. 2019. Cross-lingual transfer of semantic roles: From raw text to semantic roles. arXiv preprint arXiv:1904.03256.
    Findings
  • Wilker Aziz, Miguel Rios, and Lucia Specia. 2011. Shallow semantic trees for smt. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 316–322, Edinburgh, Scotland.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, California.
    Google ScholarLocate open access versionFindings
  • Rui Cai and Mirella Lapata. 2019. Syntax-aware semantic role labeling without parsing. Transactions of the Association for Computational Linguistics, 7:343–356.
    Google ScholarLocate open access versionFindings
  • Xavier Carreras and Lluıs Marquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pages 152–164, Ann Arbor, Michigan. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multisource cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3098–3112, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling. In Proceedings of the 6th International Conference on Konwledge Capture, pages 113–119, Banff, Canada.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Angel Daza and Anette Frank. 2019a. Translate and label! an encoder-decoder approach for crosslingual semantic role labeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 603–615, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 473–483, Vancouver, Canada.
    Google ScholarFindings
  • Shexia He, Zuchao Li, and Hai Zhao. 2019. Syntaxaware multilingual semantic role labeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5350–5359, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9:1735– 1780.
    Google ScholarLocate open access versionFindings
  • Atif Khan, Naomie Salim, and Yogan Jaya Kumar. 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Applied Soft Computing, 30:737–747.
    Google ScholarLocate open access versionFindings
  • Angel Daza and Anette Frank. 2019b. Translate and label! an encoder-decoder approach for crosslingual semantic role labeling. arXiv preprint arXiv:1908.11326.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hao Fei, Meishan Zhang, and Donghong Ji. 2020. Cross-lingual semantic role labeling with highquality translated training corpus. arXiv preprint arXiv:2004.06295.
    Findings
  • Jiang Guo, Darsh Shah, and Regina Barzilay. 2018. Multi-source domain adaptation with mixture of experts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4694–4703, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antonia Martı, Lluıs Marquez, Adam Meyers, Joakim Nivre, Sebastian Pado, Jan Stepanek, Pavel Stranak, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. The CoNLL2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep semantic role labeling: What works and what’s next. In Proceedings of the 55th
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5, pages 79–86. Citeseer.
    Google ScholarLocate open access versionFindings
  • Mikhail Kozhevnikov and Ivan Titov. 2013. Crosslingual transfer of semantic role labeling models. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1190–1200.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018. Word translation without parallel data.
    Google ScholarFindings
  • Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, and Pascale Fung. 2019. Zero-shot cross-lingual dialogue systems with transferable latent variables. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1297–1303, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani, Joost Bastings, and Ivan Titov. 2018. Exploiting semantics in neural machine translation with graph convolutional networks. In Proceedings of the the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, US.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani, Anton Frolov, and Ivan Titov. 2017. A simple and accurate syntax-agnostic neural model for dependency-based semantic role labeling. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 411–420, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Ryan McDonald, Joakim Nivre, Yvonne QuirmbachBrundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello, and Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Phoebe Mulcaire, Swabha Swayamdipta, and Noah A. Smith. 2018. Polyglot semantic role labeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 667–672, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Eric W Noreen. 1989. Computer-intensive methods for testing hypotheses. Wiley New York.
    Google ScholarFindings
  • Sebastian Padoand Mirella Lapata. 2005. Crosslinguistic projection of role-semantic information. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 859–866. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastian Padoand Mirella Lapata. 2009. Crosslingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36:307– 340.
    Google ScholarLocate open access versionFindings
  • Marth Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106.
    Google ScholarLocate open access versionFindings
  • Lonneke van der Plas, Paola Merlo, and James Henderson. 2011. Scaling up automatic cross-lingual semantic role annotation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 299–304, Portland, Oregon, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lonneke van der Plas, Tanja Samardzic, and Paola Merlo. 2010. Cross-lingual validity of PropBank in the manual annotation of French. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 113–117, Uppsala, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152.
    Google ScholarLocate open access versionFindings
  • Mohammad Sadegh Rasooli and Michael Collins. 2015. Density-driven cross-lingual transfer of dependency parsers. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 328–338, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael Roth and Mirella Lapata. 2016. Neural semantic role labeling with dependency path embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1192–1202, Berlin, Germany.
    Google ScholarLocate open access versionFindings
  • Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016.
    Google ScholarLocate open access versionFindings
  • Greedy, joint syntacticsemantic parsing with stack LSTMs. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 187–197, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Oscar Tackstrom, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 477– 487, Montreal, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jorg Tiedemann. 2015. Improving the cross-lingual projection of syntactic dependencies. In Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 1113, 2015, Vilnius, Lithuania, 109, pages 191–199. Linkoping University Electronic Press.
    Google ScholarLocate open access versionFindings
  • Bright Xu. 2019. Nlp chinese corpus: Large scale chinese corpus for nlp.
    Google ScholarFindings
  • Wajdi Zaghouani, Mona Diab, Aous Mansouri, Sameer Pradhan, and Martha Palmer. 2010. The revised Arabic PropBank. In Proceedings of the Fourth Linguistic Annotation Workshop, pages 222–226, Uppsala, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Han Zhao, Shanghang Zhang, Guanhang Wu, Geoffrey J Gordon, et al. 2018. Multiple source domain adaptation with adversarial learning.
    Google ScholarFindings
  • Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1127–1137, Beijing, China.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科