AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Wolfgang Macherey
Wolfgang Macherey
Jacob Eisenstein
Jacob Eisenstein

ACL, pp. 5961-5970, 2020.

Cited by: 0|Bibtex|Views71
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We have presented an approach to augment the training data of Neural Machine Translation models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples

Abstract:

In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, of which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embed...More

Code:

Data:

0
Introduction
Highlights
  • Recent work in neural machine translation (Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017) has led to dramatic improvements in both research and commercial systems (Wu et al, 2016)
  • While constructing semantics-preserving continuous noise in a high-dimensional space proves to be non-trivial, state-of-the-art Neural Machine Translation models are currently based on adversarial examples of discrete noise
  • We find that the generated adversarial sentences are unnatural, and, as we will show, suboptimal for learning robust Neural Machine Translation models
  • The decoder in the Neural Machine Translation model acts as a conditional language model that operates on a shifted copy of y, i.e., sos, y0, ..., y|y|−1 where sos is a start symbol of a sentence and representations of x learned by the encoder
  • We introduce a new method to augment the representations of the adversarial examples in sequence-tosequence training of the Neural Machine Translation model
  • We have presented an approach to augment the training data of Neural Machine Translation models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples
Methods
  • 2. Following Miyato et al (2017), the authors use adversarial learning to add continuous gradient-based perturbations to source word embeddings and extend it to the Transformer model.
  • 3. Sano et al (2019) leverage Miyato et al (2017)’s idea into NMT by incorporating gradient-based perturbations to both source and target word embeddings and optimize the model with adversarial training.
  • Adversarial examples are used to both attack and defend the NMT model
Results
  • Chinese-English Translation.
  • Table 1 shows results on the Chinese-English translation task, in comparison with the following six baseline methods.
  • The authors implement all these
Conclusion
  • The authors have presented an approach to augment the training data of NMT models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples.
  • To further improve the translation quality, the authors incorporate an existing vicinity distribution, similar to mixup for observed examples in the training set.
  • The authors design an augmentation algorithm over the virtual sentences sampled from both of the vicinity distributions in sequence-to-sequence NMT model training.
  • Experimental results on Chinese-English, English-French and English-German translation tasks demonstrate the capability of the approach to improving both translation performance and robustness
Summary
  • Introduction:

    Recent work in neural machine translation (Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017) has led to dramatic improvements in both research and commercial systems (Wu et al, 2016).
  • Two types of noise can be distinguished: (1) continuous noise which is modeled as a realvalued vector applied to word embeddings (Miyato et al, 2016, 2017; Cheng et al, 2018; Sano et al, 2019), and (2) discrete noise which adds, deletes, and/or replaces characters or words in the observed sentences (Belinkov and Bisk, 2018; Sperber et al, 2017; Ebrahimi et al, 2018; Michel et al, 2019; Cheng et al, 2019; Karpukhin et al, 2019)
  • In both cases, the challenge is to ensure that the noisy examples are still semantically valid translation pairs.
  • Methods:

    2. Following Miyato et al (2017), the authors use adversarial learning to add continuous gradient-based perturbations to source word embeddings and extend it to the Transformer model.
  • 3. Sano et al (2019) leverage Miyato et al (2017)’s idea into NMT by incorporating gradient-based perturbations to both source and target word embeddings and optimize the model with adversarial training.
  • Adversarial examples are used to both attack and defend the NMT model
  • Results:

    Chinese-English Translation.
  • Table 1 shows results on the Chinese-English translation task, in comparison with the following six baseline methods.
  • The authors implement all these
  • Conclusion:

    The authors have presented an approach to augment the training data of NMT models by introducing a new vicinity distribution defined over the interpolated embeddings of adversarial examples.
  • To further improve the translation quality, the authors incorporate an existing vicinity distribution, similar to mixup for observed examples in the training set.
  • The authors design an augmentation algorithm over the virtual sentences sampled from both of the vicinity distributions in sequence-to-sequence NMT model training.
  • Experimental results on Chinese-English, English-French and English-German translation tasks demonstrate the capability of the approach to improving both translation performance and robustness
Tables
  • Table1: Baseline comparison on NIST Chinese-English translation. * indicates the model uses extra corpora and means not elaborating on its training loss
  • Table2: Results on IWSLT16 English-French and WMT14 English-German translation
  • Table3: Translation Examples of Transformer and our model for an input and its adversarial input
  • Table4: Effect of α on the Chinese-English validation set. “-” indicates that the model fails to converge
  • Table5: Results on artificial noisy inputs. The column lists results for different noise fractions
Download tables as Excel
Related work
Reference
  • Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249.
    Findings
  • Olivier Chapelle, Jason Weston, Leon Bottou, and Vladimir Vapnik. 2001. Vicinal risk minimization. In Advances in neural information processing systems, pages 416–422.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust neural machine translation with doubly adversarial inputs. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. Towards robust neural machine translation. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semisupervised learning for neural machine translation. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, and Preslav Nakov. 2019. One size does not fit all: Comparing nmt representations of different granularities. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarFindings
  • Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On adversarial examples for character-level neural machine translation. In Proceedings of COLING.
    Google ScholarLocate open access versionFindings
  • Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Di Huang, and Weilong Yang. 2019. Synthetic vs real: Deep learning on controlled noise. arXiv preprint arXiv:1911.09781.
    Findings
  • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning datadriven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Vladimir Karpukhin, Omer Levy, Jacob Eisenstein, and Marjan Ghazvininejad. 2019. Training on synthetic noise improves robustness to natural noise in machine translation. arXiv preprint arXiv:1902.01509.
    Findings
  • Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, and Hassan Sajjad. 2019. Findings of the first shared task on machine translation robustness. arXiv preprint arXiv:1906.11943.
    Findings
  • Paul Michel, Xian Li, Graham Neubig, and Juan Pino. 2019. On evaluation of adversarial perturbations for sequence-to-sequence models. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarFindings
  • Takeru Miyato, Andrew M Dai, and Ian Goodfellow. 2017. Adversarial training methods for semisupervised text classification. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2016. Distributional smoothing with virtual adversarial training. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a methof for automatic evaluation of machine translation. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Motoki Sano, Jun Suzuki, and Shun Kiyono. 2019. Effective adversarial regularization for neural machine translation. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Improving nerual machine translation models with monolingual data. In Association for Computational Linguistics.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Neural machine translation of rare words with subword units. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthias Sperber, Jan Niehues, and Alex Waibel. 2017. Toward robust neural machine translation for noisy input sequences. In International Workshop on Spoken Language Translation.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Wojciech Zaremba, Sutskever Ilya, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. In Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments