Evaluating Discourse Phenomena in Neural Machine Translation

north american chapter of the association for computational linguistics, 2018.

Cited by: 82|Bibtex|Views178|DOI:https://doi.org/10.18653/v1/N18-1118
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We have presented an evaluation of discourselevel neural machine translation models through the use of two discourse test sets targeted at coreference and lexical coherence/cohesion

Abstract:

For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. In thi...More

Code:

Data:

Introduction
  • Machine translation (MT) systems typically translate sentences independently of each other.
  • The correct translation choice is determined by linguistic context, which can be outside the current sentence.
  • This disambiguating context can be source or target-side; the correct translation of anaphoric pronouns it and they depends on the gender of the translated antecedent (1).
  • A translation may depend on target factors, but may be triggered by source effects and linguistic mechanisms such as repetition or alignment (2).
  • Source or target information may provide the appropriate context (3)
Highlights
  • Machine translation (MT) systems typically translate sentences independently of each other
  • The correct translation choice is determined by linguistic context, which can be outside the current sentence
  • We review contextual neural machine translation strategies trained on subtitles in a high-resource Proceedings of NAACL-HLT 2018, pages 1304–1313 New Orleans, Louisiana, June 1 - 6, 2018. c 2018 Association for Computational Linguistics setting
  • The models are described in the first half of Table 1: #In is the number of input sentences, the type of auxiliary input of which is indicated by Aux., #Out is the number of sentences translated, and #Enc is the number of encoders used to encode the input sentences
  • We have presented an evaluation of discourselevel neural machine translation models through the use of two discourse test sets targeted at coreference and lexical coherence/cohesion
  • The observation that the decoding strategy is very effective for the handling of previous context suggests that techniques such as stream decoding, keeping a constant flow of contextual information in the recurrent node of the decoder, could be very promising for future research
Methods
  • Each of the multi-encoder strategies is tested using the previous source and target sentences as an additional input in order to test which is the most useful disambiguating context.
  • Two additional models tested are triple-encoder models, which use both the previous source and target.
  • 4.1 Data.
  • Models are trained and tested on fan-produced parallel subtitles from OpenSubtitles20166 (Lison and Tiedemann, 2016).
  • The data is first corrected using heuristics (e.g. minor corrections of OCR
Results
  • Overall translation quality is evaluated using the traditional automatic metric BLEU (Papineni et al, 2002) (Tab. 1) to ensure that the models do not degrade overall performance.
  • The models are described in the first half of Table 1: #In is the number of input sentences, the type of auxiliary input of which is indicated by Aux., #Out is the number of sentences translated, and #Enc is the number of encoders used to encode the input sentences.
  • There is no clear second best model, since performance depends strongly on the test set used
Conclusion
  • The authors have presented an evaluation of discourselevel NMT models through the use of two discourse test sets targeted at coreference and lexical coherence/cohesion.
  • The authors have shown that multiencoder architectures alone have a limited capacity to exploit discourse-level context; poor results are found for coreference and more promising results for coherence/cohesion, there is room for improvement.
  • The authors' novel combination of contextual strategies greatly outperfoms existing models.
  • This strategy uses the previous source sentence as an auxiliary input and decodes both the current and previous sentence.
  • The observation that the decoding strategy is very effective for the handling of previous context suggests that techniques such as stream decoding, keeping a constant flow of contextual information in the recurrent node of the decoder, could be very promising for future research
Summary
  • Introduction:

    Machine translation (MT) systems typically translate sentences independently of each other.
  • The correct translation choice is determined by linguistic context, which can be outside the current sentence.
  • This disambiguating context can be source or target-side; the correct translation of anaphoric pronouns it and they depends on the gender of the translated antecedent (1).
  • A translation may depend on target factors, but may be triggered by source effects and linguistic mechanisms such as repetition or alignment (2).
  • Source or target information may provide the appropriate context (3)
  • Methods:

    Each of the multi-encoder strategies is tested using the previous source and target sentences as an additional input in order to test which is the most useful disambiguating context.
  • Two additional models tested are triple-encoder models, which use both the previous source and target.
  • 4.1 Data.
  • Models are trained and tested on fan-produced parallel subtitles from OpenSubtitles20166 (Lison and Tiedemann, 2016).
  • The data is first corrected using heuristics (e.g. minor corrections of OCR
  • Results:

    Overall translation quality is evaluated using the traditional automatic metric BLEU (Papineni et al, 2002) (Tab. 1) to ensure that the models do not degrade overall performance.
  • The models are described in the first half of Table 1: #In is the number of input sentences, the type of auxiliary input of which is indicated by Aux., #Out is the number of sentences translated, and #Enc is the number of encoders used to encode the input sentences.
  • There is no clear second best model, since performance depends strongly on the test set used
  • Conclusion:

    The authors have presented an evaluation of discourselevel NMT models through the use of two discourse test sets targeted at coreference and lexical coherence/cohesion.
  • The authors have shown that multiencoder architectures alone have a limited capacity to exploit discourse-level context; poor results are found for coreference and more promising results for coherence/cohesion, there is room for improvement.
  • The authors' novel combination of contextual strategies greatly outperfoms existing models.
  • This strategy uses the previous source sentence as an auxiliary input and decodes both the current and previous sentence.
  • The observation that the decoding strategy is very effective for the handling of previous context suggests that techniques such as stream decoding, keeping a constant flow of contextual information in the recurrent node of the decoder, could be very promising for future research
Tables
  • Table1: Results (de-tokenised, cased BLEU) of the ensembled models on four different test sets, each containing three films from each film genre. The best, second- and third-best results are highlighted by decreasingly dark shades of green
  • Table2: Results on the discourse test sets (% correct). Results on the coreference set are also given for each pronoun class. CORR. and SEMI correspond respectively to the “correct” and “semi-correct” examples. The best, second- and third-best results are highlighted by decreasingly dark shades of green
Download tables as Excel
Funding
  • Rico Sennrich has received funding from the Swiss National Science Foundation (SNF) in the project CoNTra (grant number 105212 169888)
  • This project has also received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements 644333 (SUMMA) and 644402 (HimL)
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations. ICLR’15. ArXiv: 1409.0473.
    Findings
  • Rachel Bawden. 2016. Cross-lingual Pronoun Prediction with Linguistically Informed Features. In Proceedings of the 1st Conference on Machine Translation. Berlin, Germany, WMT’16, pages 564–570.
    Google ScholarLocate open access versionFindings
  • Ozan Caglayan, Loıc Barrault, and Fethi Bougares. 2016. Multimodal Attention for Neural Machine Translation. In arXiv:1609.03976.
    Findings
  • Marine Carpuat. 2009. One translation per discourse. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Boulder, Colorado, USA, SEW’09, pages 19–27.
    Google ScholarLocate open access versionFindings
  • Robert De Beaugrande and Wolfgang Dressler. 1981. Introduction to Text Linguistics. Longman, London.
    Google ScholarFindings
  • Liane Guillou. 201Incorporating Pronoun Function into Statistical Machine Translation. Ph.D. thesis, School of Informatics. University of Edinburgh.
    Google ScholarFindings
  • Liane Guillou and Christian Hardmeier. 2016. PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation. In Proceedings of the 10th Language Resources and Evaluation Conference. Portoroz, Slovenia, LREC’16, pages 636–643.
    Google ScholarLocate open access versionFindings
  • Liane Guillou, Christian Hardmeier, Preslav Nakov, Sara Stymne, Jorg Tiedemann, Yannick Versley, Mauro Cettolo, Bonnie Webber, and Andrei Popescu-Belis. 2016. Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction. In Proceedings of the 1st Conference on Machine Translation. Berlin, Germany, WMT’16, pages 525– 542.
    Google ScholarLocate open access versionFindings
  • Christian Hardmeier. 2014. Discourse in Statistical Machine Translation. Ph.D. thesis, Uppsala University, Department of Linguistics and Philology, Uppsala, Sweden.
    Google ScholarFindings
  • Po-Yao Huang, Frederick Liu, Sz-Rung Shiang, Jean Oh, and Chris Dyer. 2016. Attention-based Multimodal Neural Machine Translation. In Proceedings of the 1st Conference on Machine Translation. Berlin, Germany, volume 2: of WMT’16, pages 639– 645.
    Google ScholarLocate open access versionFindings
  • Pierre Isabelle, Colin Cherry, and George Foster. 2017. A Challenge Set Approach to Evaluating Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, EMNLP’17, pages 2476–2486.
    Google ScholarLocate open access versionFindings
  • Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 2017a. Does Neural Machine Translation Benefit from Larger Context? In arXiv:1704.05135. ArXiv: 1704.05135.
    Findings
  • Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 2017b. Neural Machine Translation for Cross-Lingual Pronoun Prediction. In Proceedings of the 3rd Workshop on Discourse in Machine Translation. Copenhagen, Denmark, DISCOMT’17, pages 54–57.
    Google ScholarLocate open access versionFindings
  • Miceli Barone, Jozef Mokry, and Maria Nadejde. 2017. Nematus: a Toolkit for Neural Machine Translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, EACL’17, pages 65–68.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic, ACL’07, pages 177–180.
    Google ScholarLocate open access versionFindings
  • Jindrich Libovickyand Jindrich Helcl. 2017. Attention Strategies for Multi-Source Sequence-to-Sequence Learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, ACL’17, pages 196–202.
    Google ScholarLocate open access versionFindings
  • Pierre Lison and Jorg Tiedemann. 2016. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th Language Resources and Evaluation Conference. Portoroz, Slovenia, LREC’16, pages 923–929.
    Google ScholarLocate open access versionFindings
  • Sharid Loaiciga, Sara Stymne, Preslav Nakov, Christian Hardmeier, Jorg Tiedemann, Mauro Cettolo, and Yannick Versley. 2017. Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction. In Proceedings of the 3rd Workshop on Discourse in Machine Translation. Copenhagen, Denmark, DISCOMT’17, pages 1–16.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, ACL’16, pages 1715– 1725.
    Google ScholarLocate open access versionFindings
  • Jorg Tiedemann and Yves Scherrer. 2017. Neural Machine Translation with Extended Context. In Proceedings of the 3rd Workshop on Discourse in Machine Translation. Copenhagen, Denmark, DISCOMT’17, pages 82–92.
    Google ScholarLocate open access versionFindings
  • Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. 2017. Exploiting Cross-Sentence Context for Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Denmark, Copenhagen, EMNLP’17, pages 2816–2821.
    Google ScholarLocate open access versionFindings
  • Barret Zoph and Kevin Knight. 2016. Multi-source Neural Translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, California, USA, NAACL’16, pages 30–34.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, USA, ACL’02, pages 311–318.
    Google ScholarLocate open access versionFindings
  • Annette Rios Gonzales, Laura Mascarell, and Rico Sennrich. 2017. Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings. In Proceedings of the 2nd Conference on Machine Translation. Copenhagen, Denmark, WMT’17, pages 11–19.
    Google ScholarLocate open access versionFindings
  • Carolina Scarton and Lucia Specia. 2015. A Quantitative Analysis of Discourse Phenomena in Machine Translation. Discours [online] (16). https://doi.org/10.4000/discours.9047.
    Locate open access versionFindings
  • Rico Sennrich. 2017. How Grammatical is Characterlevel Neural Machine Translation? In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, EACL’17, pages 376–382.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments