AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
View the video

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Experiments show that our approach can significantly improve the performance of document-level neural machine translation models with the selected dynamic context sentences

Dynamic Context Selection for Document level Neural Machine Translation via Reinforcement Learning

EMNLP 2020, pp.2242-2254, (2020)

Cited by: 0|Views28792
Full Text
Bibtex
Weibo

Abstract

Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context ...More

Code:

Data:

0
Introduction
  • Neural machine translation (NMT) has achieved great progress in recent years (Cho et al, 2014; Bahdanau et al, 2015; Luong et al, 2015; Vaswani et al, 2017), when fed an entire document, standard NMT systems translate sentences in isolation without considering the cross-sentence dependencies.
  • There is still an issue that has received less attention: which context sentences should be used when translating a source sentence?.
  • The authors conduct an experiment to verify an intuition: the translation of different source sentences requires different context.
  • The authors obtain dynamic context sentences that achieve the best BLEU scores by traversing all the context combinations for each source sentence.
  • Experiments indicate that only the limited context sentences are really useful, and they change with source sentences
Highlights
  • Neural machine translation (NMT) has achieved great progress in recent years (Cho et al, 2014; Bahdanau et al, 2015; Luong et al, 2015; Vaswani et al, 2017), when fed an entire document, standard neural machine translation (NMT) systems translate sentences in isolation without considering the cross-sentence dependencies
  • Experiments show that our approach can significantly improve the performance of document-level neural machine translation (DocNMT) models with the selected dynamic context sentences
  • We propose an effective reward that is related to the translation quality to guide the dynamic selection of context sentences and the optimization of parameters in DocNMT model
  • In order to make a fair comparison with existing DocNMT models setting fixed context size, we propose a size-first strategy that selects the certain number of context sentences with the highest probability except “ N ON ”
  • The performance of DocNMT models with fixed context is shown in row 2∼5
  • We train the whole model via reinforcement learning, and design a novel reward to encourage the selection of useful context sentences
Methods
  • 16 SAN + DCS-PF first strategy can still improve original TDNMT by +0.52 BLEU (19.91 vs 19.39).
  • It indicates that useless context sentences still exist in a small scope.
  • Row 6∼16 show the models trained and tested with dynamic context settings.
  • Row 6 and 7 show a lower bound that randomly selects the fixed size context sentences.
  • The results are similar to original models with the fixed size previous sentences.
  • In contrast to the random selection, the approach can select the same size of context sentences that are really helpful to generate better translations
Results
  • Results and Analysis

    5.1 Main Results

    The authors use BLEU (Papineni et al, 2002) score to evaluate the translation quality.
  • Table 10 shows the performance of models utilizing different context settings.
  • Comparison with Fixed Context Methods.
  • The performance of DocNMT models with fixed context is shown in row 2∼5.
  • Row 2 and 3 follow the context settings in the published papers.
  • It can be found that using more context sentences indiscriminately does not bring significant BLEU improvement.
  • Instead, it increases computational cost
Conclusion
  • The authors propose a dynamic selection method to choose variable sizes of context sentences for documentlevel translation.
  • The candidate context sentences are scored and selected by two proposed strategies.
  • The authors train the whole model via reinforcement learning, and design a novel reward to encourage the selection of useful context sentences.
  • When applied to existing DocNMT models, the approach can improve translation quality significantly.
  • The authors will select context sentences in larger candidate space, and explore more effective ways to extend the approach to select target-side context sentences
Tables
  • Table1: The BLEU (%) scores with different context settings. “Model1” and “Model2” are trained with previous 2 and 6 context sentences, respectively. Underlined results indicate that training and test context settings are consistent
  • Table2: Dataset statistics in the number of sentences
  • Table3: Performance of models on BLEU (%) using different context settings. “full” means using all context in the scope. “random”, “attend”, and “select” stand for selecting sentences randomly, implicitly based on attention weights, and explicitly by our approaches, respectively. “dyn” stands for dynamic size. “DCS-SF” and “DCS-PF” mean dynamic context selection by size-first and probability-first strategies respectively. All results using “DCS” are statistically significantly (p-values < 0.05) better than corresponding original DocNMT models
  • Table4: Effect of training settings for DocNMT models. BLEU scores are measured based on TDNMT on the development set in En→De Europarl. means training the module while × means not. Row 1 stands for the original TDNMT model
  • Table5: Results of empty context prediction on 500 sentences with human annotation as reference
  • Table6: BLEU (%) scores on the context-empty and context-nonempty test sets. “+” stands for the improvement when compared with TDNMT
  • Table7: Accuracy (%) of context selection on the discourse phenomena test sets. A model contains two rows: upper – exact match, lower – selected context contains the golden answer
  • Table8: Accuracy (%) of discourse phenomena
  • Table9: BLEU (%) scores on En→De TED development set using different layers of context scorer
  • Table10: Statistics of human annotation for empty context. A1 and A2 stand for two annotators
  • Table11: Statistics of our approach (Ours) for empty context prediction
Download tables as Excel
Related work
Funding
  • The research work described in this paper has been supported by the Natural Science Foundation of China under Grant No U1836221 and 61673380
  • The research work in this paper has also been supported by Beijing Advanced Innovation Center for Language Resources and Beijing Academy of Artificial Intelligence (BAAI2019QN0504)
Reference
  • Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. 2018. Evaluating discourse phenomena in neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1304–1313, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yen-Chun Chen and Mohit Bansal. 2018. Fast abstractive summarization with reinforce-selected sentence rewriting. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 675–686, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724– 1734, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastien Jean and Kyunghyun Cho. 2019. Contextaware learning for neural machine translation. arXiv preprint arXiv:1903.04715.
    Findings
  • Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 201Does neural machine translation benefit from larger context? arXiv preprint arXiv:1704.05135.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • He Bai, Yu Zhou, Jiajun Zhang, Liang Zhao, Mei-Yuh Hwang, and Chengqing Zong. 2018. Source critical reinforcement learning for transferring spoken language understanding to a new language. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3597–3607, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Marcin Junczys-Dowmunt. 2019. Microsoft translator at WMT 2019: Towards large-scale document-level neural machine translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 225–233, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, and Preslav Nakov. 2019. Evaluating pronominal anaphora in machine translation: An evaluation measure and a test suite. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language
    Google ScholarLocate open access versionFindings
  • Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 2964–2975, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiaomian Kang and Chengqing Zong. 2020. Fusion of discourse structural position encoding for neural machine translation. Chinese Journal of Intelligent Science and Technology, 2(2):144–152.
    Google ScholarLocate open access versionFindings
  • Yunsu Kim, Duc Thanh Tran, and Hermann Ney. 2019. When and why is document-level context useful in neural machine translation? In Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), pages 24–34, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ryuichiro Kimura, Shohei Iida, Hongyi Cui, Po-Hsuan Hung, Takehito Utsuro, and Masaaki Nagata. 2019. Selecting informative context sentence by forced back-translation. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pages 162–171, Dublin, Ireland. European Association for Machine Translation.
    Google ScholarLocate open access versionFindings
  • Shaohui Kuang, Deyi Xiong, Weihua Luo, and Guodong Zhou. 2018. Modeling coherence for neural machine translation with dynamic and topic caches. In Proceedings of the 27th International Conference on Computational Linguistics, pages 596–606, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shuming Ma, Dongdong Zhang, and Ming Zhou. 2020. A simple and effective unified encoder for documentlevel machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3505–3511, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. volume 48 of Proceedings of Machine Learning Research, pages 1614–1623, New York, New York, USA. PMLR.
    Google ScholarLocate open access versionFindings
  • Sameen Maruf and Gholamreza Haffari. 2018. Document context neural machine translation with memory networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1275– 1284, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sameen Maruf, Andre FT Martins, and Gholamreza Haffari. 2019. Selective attention for context-aware neural machine translation. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3092–3102, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarFindings
  • Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. 2018. Document-level neural machine translation with hierarchical attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2947–2954, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mathias Muller, Annette Rios, Elena Voita, and Rico Sennrich. 2018. A large-scale test set for the evaluation of context-aware pronoun translation in neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 61–72, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
    Google ScholarFindings
  • Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7008–7024.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xin Tan, Longyin Zhang, Deyi Xiong, and Guodong Zhou. 2019. Hierarchical modeling of global context for document-level neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1576– 1585, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jorg Tiedemann and Yves Scherrer. 2017. Neural machine translation with extended context. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 82–92, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhaopeng Tu, Yang Liu, Shuming Shi, and Tong Zhang. 2018. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 6:407–420.
    Google ScholarLocate open access versionFindings
  • Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 76– 85, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Elena Voita, Rico Sennrich, and Ivan Titov. 2019a. Context-aware monolingual repair for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 877–886, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Elena Voita, Rico Sennrich, and Ivan Titov. 2019b. When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1198–1212, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan Titov. 2018. Context-aware neural machine translation learns anaphora resolution. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1264–1274, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. 2017. Exploiting cross-sentence context for neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2826–2831, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2019. A compact and language-sensitive multilingual translation method. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1213–1223, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, and TieYan Liu. 2018. A study of reinforcement learning for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3612–3621, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hao Xiong, Zhongjun He, Hua Wu, and Haifeng Wang. 2019. Modeling coherence for discourse neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7338–7345.
    Google ScholarLocate open access versionFindings
  • Hongfei Xu, Deyi Xiong, Josef van Genabith, and Qiuhui Liu. 2020. Efficient context-aware neural machine translation with layer-wise weighting and input-aware gating. In Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3933–3940. International Joint Conferences on Artificial Intelligence Organization.
    Google ScholarLocate open access versionFindings
  • Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, and Jie Zhou. 2019. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1527– 1537, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. Thumt: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415.
    Findings
  • Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Min Zhang, and Yang Liu. 2018. Improving the transformer translation model with document-level context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 533–542, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiajun Zhang and Chengqing Zong. 2015. Deep neural networks in machine translation: An overview. IEEE Intelligent Systems, (5):16–25.
    Google ScholarLocate open access versionFindings
  • Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1535–1545, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yang Zhao, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2020. Knowledge graphs enhanced neural machine translation. In Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 4039–4045. International Joint Conferences on Artificial Intelligence Organization.
    Google ScholarLocate open access versionFindings
  • Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, and Alexandra Birch. 2020. Towards making the most of context in neural machine translation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI20, pages 3983–3989. International Joint Conferences on Artificial Intelligence Organization.
    Google ScholarLocate open access versionFindings
  • Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 378–384, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Long Zhou, Jiajun Zhang, and Chengqing Zong. 2019. Synchronous bidirectional neural machine translation. Transactions of the Association for Computational Linguistics, 7:91–105.
    Google ScholarLocate open access versionFindings
  • We implement all models based on the toolkit THUMT5 with the parameters of the “base” version of Transformer (Vaswani et al., 2017). Specifically, we use 6 layers of encoder and decoder with 8 attention heads. The hidden size and feed-forward layer size are 512 and 2,048, respectively. For Zh→En, Chinese and English vocabulary sizes are 30K and 25K, respectively. For En→De, source-side and target-side share a vocabulary table. The vocabulary size is 30K. Chinese sentences are segmented into words by our in-house toolkit. English and German datasets are tokenized and truecased by the Moses toolkit6. Words are segmented by bytepair encoding (Sennrich et al., 2016).
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科