Toward Making the Most of Context in Neural Machine Translation

IJCAI 2020, pp. 3983-3989, 2020.

Cited by: 0|Bibtex|Views114|DOI:https://doi.org/10.24963/ijcai.2020/551
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We argue that previous research did not make a clear use of the global context, and propose a new document-level neural machine translation framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both sourc...

Abstract:

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awaren...More

Code:

Data:

0
Introduction
  • Recent studies suggest that neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015; Vaswani et al, 2017] has achieved human parity, especially on resource-rich language pairs [Hassan et al, 2018].
  • The representation of each word in the current sentence is a deep hybrid of both global document context and local sentence context in every layer
  • The authors notice that these hybrid encoding approaches have two main weaknesses:.
  • N k=1 is a target document with n sentences, the training criterion for document-level NMT model (DOCNMT) is to maximize the conditional log-likelihood over the pairs of document translation sentence by sentence by: L(Dd; θ}) = log p(Y (m)|X(m); θ) Mn =
Highlights
  • Recent studies suggest that neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015; Vaswani et al, 2017] has achieved human parity, especially on resource-rich language pairs [Hassan et al, 2018]
  • Standard NMT systems are designed for sentencelevel translation, which cannot consider the dependencies among sentences and translate entire documents
  • The representation of each word in the current sentence is a deep hybrid of both global document context and local sentence context in every layer
  • We propose a new NMT framework that is able to deal with documents containing any number of sentences, including single-sentence documents, making training and deployment simpler and more flexible
  • Segment-level Relative Attention Given the local representations of each sentences, we propose to extend the relative attention [Shaw et al, 2018] from token-level to segmentlevel to model the inter-sentence global context: hG = MultiHead(Seg-Attn(hL, hL, hL)), where Seg-Attn(Q, K, V) denotes the proposed segmentlevel relative attention
Results
Conclusion
  • Does Bilingual Context Really Matter?
  • Yes. To investigate how important the bilingual context is and corresponding contributions of each component, the authors summary the ablation study in Table 3.
  • Modeling target contextIn this paper, the authors propose a unified local and global NMT framework, which can successfully exploit context regardless of how many sentence(s) are in the input.
  • Extensive experimentation and analysis show that the model has learned to leverage a larger context.
  • In future work the authors will investigate the feasibility of extending the approach to other document-level NLP tasks, e.g., summarization
Summary
  • Introduction:

    Recent studies suggest that neural machine translation (NMT) [Sutskever et al, 2014; Bahdanau et al, 2015; Vaswani et al, 2017] has achieved human parity, especially on resource-rich language pairs [Hassan et al, 2018].
  • The representation of each word in the current sentence is a deep hybrid of both global document context and local sentence context in every layer
  • The authors notice that these hybrid encoding approaches have two main weaknesses:.
  • N k=1 is a target document with n sentences, the training criterion for document-level NMT model (DOCNMT) is to maximize the conditional log-likelihood over the pairs of document translation sentence by sentence by: L(Dd; θ}) = log p(Y (m)|X(m); θ) Mn =
  • Results:

    Document-level Translation The authors list the results of the experiments in Table 1, comparing four context-aware NMT models, i.e., Document-aware Transformer [Zhang et al, 2018, DocT], Hierarchical Attention NMT [Miculicich et al, 2018, HAN], Selective Attention NMT [Maruf et al, 2019, SAN] and Query-guided Capsule Network [Yang et al, 2019, QCN].
  • SENTNMT [Vaswani et al, 2017] DocT [Zhang et al, 2018] HAN [Miculicich et al, 2018] SAN [Maruf et al, 2019] QCN [Yang et al, 2019] FINAL ∆|θ|.
  • Conclusion:

    Does Bilingual Context Really Matter?
  • Yes. To investigate how important the bilingual context is and corresponding contributions of each component, the authors summary the ablation study in Table 3.
  • Modeling target contextIn this paper, the authors propose a unified local and global NMT framework, which can successfully exploit context regardless of how many sentence(s) are in the input.
  • Extensive experimentation and analysis show that the model has learned to leverage a larger context.
  • In future work the authors will investigate the feasibility of extending the approach to other document-level NLP tasks, e.g., summarization
Tables
  • Table1: Experiment results of our model in comparison with several baselines, including increments of the number of parameters over Transformer baseline (∆|θ|), training/testing speeds (vtrain/vtest, some of them are derived from <a class="ref-link" id="cMaruf_et+al_2019_a" href="#rMaruf_et+al_2019_a">Maruf et al [2019</a>]), and translation results of the testsets in BLEU score
  • Table2: Results of sentence-level translation on TED ZH-EN
  • Table3: Ablation study on modeling context on TED ZH-EN development set. ”Doc” means using a entire document as a sequence for input or output. BLEUdoc indicates the document-level BLEU score calculated on the concatenation of all output sentences
  • Table4: Effect of transfer learning (TL)
  • Table5: Accuracy (%) of discourse phenomena. ∗ different data and system conditions, only for reference
Download tables as Excel
Related work
  • Context beyond the current sentence is crucial for machine translation. Bawden et al [2018], Laubli et al [2018], Muller et al [2018], and Voita et al [2018] show that without access to the document-level context, NMT is likely to fail to maintain lexical, tense, deixis and ellipsis consistencies, resolve anaphoric pronouns and other discourse characteristics, and propose corresponding testsets for evaluating discourse phenomena in NMT.

    Most of the current document-level NMT models can be classified into two main categories, context-aware model, and post-processing model. The post-processing models introduce an additional module that learns to refine the translations produced by context-agnostic NMT systems to be more discourse coherence [Xiong et al, 2019; Voita et al, 2019]. While this kind of approach is easy to deploy, the two-stage generation process may result in error accumulation.

    In this paper, we pay our attention mainly on context-aware models, while post-processing approaches can be incorporated with and facilitate any NMT architectures. Tiedemann and Scherrer [2017] and Junczys-Dowmunt [2019] use the concatenation of multiple sentences (usually a small number of preceding sentences) as NMT’s input/output. Going beyond simple concatenation, Jean et al [2017] introduce a separate context encoder for a few previous source sentences. Wang et al [2017] includes a hierarchical RNN to summarize source-side context. There are other approaches using a dynamic cache memory to store representations of previously translated contents [Tu et al, 2018; Kuang et al, 2018; Kuang and Xiong, 2018; Maruf and Haffari, 2018]. Miculicich et al [2018], Zhang et al [2018], Yang et al [2019], Maruf et al [2019] and Tan et al [2019] extend contextaware model to Transformer architecture with additional context related modules.
Funding
  • This work was supported by the National Science Foundation of China (No U1836221, 61772261, 61672277)
  • Zaixiang Zheng was also supported by China Scholarship Council (No 201906190162)
  • Alexandra Birch was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreements No 825299 (GoURMET) and also by the UK EPSRC fellowship grant EP/S001271/1 (MTStretch)
Reference
  • [Bahdanau et al., 2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • [Bawden et al., 2018] Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. Evaluating discourse phenomena in neural machine translation. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • [Dai et al., 2019] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • [Hassan et al., 2018] Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567, 2018.
    Findings
  • [Jean et al., 2017] Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. Does neural machine translation benefit from larger context? CoRR, abs/1704.05135, 2017.
    Findings
  • [Junczys-Dowmunt, 2019] Marcin Junczys-Dowmunt. Microsoft translator at wmt 2019: Towards large-scale document-level neural machine translation. In WMT, 2019.
    Google ScholarLocate open access versionFindings
  • [Kingma and Ba, 2014] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • [Kuang and Xiong, 2018] Shaohui Kuang and Deyi Xiong. Fusing recency into neural machine translation with an inter-sentence gate model. In COLING, 2018.
    Google ScholarLocate open access versionFindings
  • [Kuang et al., 2018] Shaohui Kuang, Deyi Xiong, Weihua Luo, and Guodong Zhou. Modeling coherence for neural machine translation with dynamic and topic caches. In COLING, 2018.
    Google ScholarLocate open access versionFindings
  • [Laubli et al., 2018] Samuel Laubli, Rico Sennrich, and Martin Volk. Has machine translation achieved human parity? a case for document-level evaluation. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2019] Liangyou Li, Xin Jiang, Qun Liu, Huawei Noah’, and Ark Lab. Pretrained Language Models for Document-Level Neural Machine Translation. arXiv preprint, 2019.
    Google ScholarFindings
  • [Maruf and Haffari, 2018] Sameen Maruf and Gholamreza Haffari. Document context neural machine translation with memory networks. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Maruf et al., 2019] Sameen Maruf, Andre FT Martins, and Gholamreza Haffari. Selective attention for context-aware neural machine translation. In NAACL-HLT, 2019.
    Google ScholarLocate open access versionFindings
  • [Miculicich et al., 2018] Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. Document-level neural machine translation with hierarchical attention networks. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Muller et al., 2018] Mathias Muller, Annette Rios, Elena Voita, and Rico Sennrich. A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation. In WMT, 2018.
    Google ScholarLocate open access versionFindings
  • [Papineni et al., 2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
    Google ScholarLocate open access versionFindings
  • [Sennrich et al., 2016] Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Shaw et al., 2018] Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
    Google ScholarLocate open access versionFindings
  • [Szegedy et al., 2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • [Tan et al., 2019] Xin Tan, Longyin Zhang, Deyi Xiong, and Guodong Zhou. Hierarchical modeling of global context for document-level neural machine translation. In EMNLP-IJCNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Tiedemann and Scherrer, 2017] Jorg Tiedemann and Yves Scherrer. Neural machine translation with extended context. In DiscoMT, 2017.
    Google ScholarLocate open access versionFindings
  • [Tu et al., 2018] Zhaopeng Tu, Yang Liu, Shuming Shi, and Tong Zhang. Learning to remember translation history with a continuous cache. TACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Vaswani et al., 2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is All you Need. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • [Voita et al., 2018] Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan Titov. Context-aware neural machine translation learns anaphora resolution. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Voita et al., 2019] Elena Voita, Rico Sennrich, and Ivan Titov. Context-aware monolingual repair for neural machine translation. In EMNLP-IJCNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2017] Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. Exploiting cross-sentence context for neural machine translation. In EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • [Xiong et al., 2019] Hao Xiong, Zhongjun He, Hua Wu, and Haifeng Wang. Modeling coherence for discourse neural machine translation. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Yang et al., 2019] Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, and Jie Zhou. Enhancing context modeling with a query-guided capsule network for document-level translation. In EMNLP-IJCNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Zhang et al., 2018] Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, and Yang Liu. Improving the transformer translation model with document-level context. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments