Robust Neural Machine Translation with Doubly Adversarial Inputs
Meeting of the Association for Computational Linguistics, 2019.
EI
Weibo:
Abstract:
Neural machine translation (NMT) often suffers from the vulnerability to noisy perturbations in the input. We propose an approach to improving the robustness of NMT models, which consists of two parts: (1) attack the translation model with adversarial source examples; (2) defend the translation model with adversarial target inputs to im...More
Code:
Data:
Introduction
- Neural machine translation (NMT) has achieved tremendous success in advancing the quality of machine translation (Wu et al, 2016; Hieber et al, 2017).
- Belinkov and Bisk (2018) found that NMT models can be immensely brittle to small perturbations applied to the inputs.
- Even if these perturbations are not strong enough to alter the meaning of an input sentence, they can result in different and often incorrect translations.
- Conditioned on the hidden representations h and the target input z, the decoder generates y as: J
Highlights
- In recent years, neural machine translation (NMT) has achieved tremendous success in advancing the quality of machine translation (Wu et al, 2016; Hieber et al, 2017)
- We propose a gradientbased method, AdvGen, to construct adversarial examples guided by the final translation loss from the clean inputs of a neural machine translation model
- This subsection validates the robustness of the neural machine translation models over artificial noise
- We have presented an approach to improving the robustness of the neural machine translation models with doubly adversarial inputs
- We have introduced a white-box method to generate adversarial examples for neural machine translation
- We plan to explore the direction to generate more natural adversarial examples dispensing with word replacements and more advanced defense approaches such as curriculum learning (Jiang et al, 2018, 2015)
Methods
- Vaswani et al (2017) Miyato et al (2017) Sennrich et al (2016a) Wang et al (2018)
Cheng et al (2018)
Sennrich et al (2016b)* Ours Ours + BackTranslation* Model
Trans.-Base Trans.-Base Trans.-Base Trans.-Base RNMTlex. - The authors compare the approach with Transformer for different numbers of hidden units (i.e. 1024 and 512) and a related RNN-based NMT model RNMT+ (Chen et al, 2018).
- As is shown in Table 4, the approach achieves improvements over the Transformer for the same number of hidden units, i.e. 1.04 BLEU points over Trans.Base, 1.61 BLEU points over Trans.-Big, and 1.52 BLEU points over RNMT+ model.
- The notable gain in terms of BLEU verifies the English-German translation model.
- Compared to (Miyato et al, 2017), the authors found that continuous gradientbased perturbations to word embeddings can be absorbed quickly, often resulting in a worse BLEU score than the proposed discrete perturbations by word replacement
Results
- English-German translation benchmarks show that the approach yields an improvement of 2.8 and 1.6 BLEU points over the stateof-the-art models including Transformer (Vaswani et al, 2017).
- This result substantiates that the model improves the generalization performance over the clean benchmark datasets.
- The authors re-scored those sentences using a pre-trained bidirectional language model, and picked the best one as the noisy input
Conclusion
- The authors have presented an approach to improving the robustness of the NMT models with doubly adversarial inputs.
- The authors have introduced a white-box method to generate adversarial examples for NMT.
- The authors plan to explore the direction to generate more natural adversarial examples dispensing with word replacements and more advanced defense approaches such as curriculum learning (Jiang et al, 2018, 2015)
Summary
Introduction:
Neural machine translation (NMT) has achieved tremendous success in advancing the quality of machine translation (Wu et al, 2016; Hieber et al, 2017).- Belinkov and Bisk (2018) found that NMT models can be immensely brittle to small perturbations applied to the inputs.
- Even if these perturbations are not strong enough to alter the meaning of an input sentence, they can result in different and often incorrect translations.
- Conditioned on the hidden representations h and the target input z, the decoder generates y as: J
Methods:
Vaswani et al (2017) Miyato et al (2017) Sennrich et al (2016a) Wang et al (2018)
Cheng et al (2018)
Sennrich et al (2016b)* Ours Ours + BackTranslation* Model
Trans.-Base Trans.-Base Trans.-Base Trans.-Base RNMTlex.- The authors compare the approach with Transformer for different numbers of hidden units (i.e. 1024 and 512) and a related RNN-based NMT model RNMT+ (Chen et al, 2018).
- As is shown in Table 4, the approach achieves improvements over the Transformer for the same number of hidden units, i.e. 1.04 BLEU points over Trans.Base, 1.61 BLEU points over Trans.-Big, and 1.52 BLEU points over RNMT+ model.
- The notable gain in terms of BLEU verifies the English-German translation model.
- Compared to (Miyato et al, 2017), the authors found that continuous gradientbased perturbations to word embeddings can be absorbed quickly, often resulting in a worse BLEU score than the proposed discrete perturbations by word replacement
Results:
English-German translation benchmarks show that the approach yields an improvement of 2.8 and 1.6 BLEU points over the stateof-the-art models including Transformer (Vaswani et al, 2017).- This result substantiates that the model improves the generalization performance over the clean benchmark datasets.
- The authors re-scored those sentences using a pre-trained bidirectional language model, and picked the best one as the noisy input
Conclusion:
The authors have presented an approach to improving the robustness of the NMT models with doubly adversarial inputs.- The authors have introduced a white-box method to generate adversarial examples for NMT.
- The authors plan to explore the direction to generate more natural adversarial examples dispensing with word replacements and more advanced defense approaches such as curriculum learning (Jiang et al, 2018, 2015)
Tables
- Table1: An example of Transformer NMT translation result for an input and its perturbed input by replacing “他(he)” to “她(she)”
- Table2: Comparison with baseline methods trained on different backbone models (second column). * indicates the method trained using an extra corpus
- Table3: Results on NIST Chinese-English translation
- Table4: Results on WMT’14 English-German translation
- Table5: Comparison of translation results of Transformer and our model for an input and its perturbed input
- Table6: Results on artificial noisy inputs. The column lists results for different noise fractions
- Table7: BLEU scores computed using the zero noise fraction output as a reference
- Table8: Ablation study on Chinese-English translation. means that it is included in training
- Table9: Effect of the ratio value γsrc and γtrg on Chinese-English Translation
Reference
- Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Empirical Methods in Natural Language Processing.
- Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.
- Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. arXiv preprint arXiv:1808.07561.
- Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In International Conference on Learning Representations.
- Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, et al. 2018. The best of both worlds: Combining recent advances in neural machine translation. In Association for Computational Linguistics.
- Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. Towards robust neural machine translation. In Association for Computational Linguistics.
- Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semisupervised learning for neural machine translation. In Association for Computational Linguistics.
- Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018a. On adversarial examples for character-level neural machine translation. In Proceedings of COLING.
- Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018b. Hotflip: White-box adversarial examples for text classification. In Association for Computational Linguistics.
- Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Empirical Methods in Natural Language Processing.
- Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Association for Computational Linguistics.
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In International Conference on Machine Learning.
- Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations.
- Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in Neural Information Processing Systems, pages 820–828.
- Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, and Matt Post. 2017. Sockeye: A toolkit for neural machine translation. arXiv preprint arXiv:1712.05690.
- Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Empirical Methods in Natural Language Processing.
- Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, and Alexander G Hauptmann. 2015. Self-paced curriculum learning. In AAAI Conference on Artificial Intelligence.
- Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 20Mentornet: Learning datadriven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning.
- Vladimir Karpukhin, Omer Levy, Jacob Eisenstein, and Marjan Ghazvininejad. 20Training on synthetic noise improves robustness to natural noise in machine translation. arXiv preprint arXiv:1902.01509.
- Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. In Empirical Methods in Natural Language Processing.
- Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2018. Robust neural machine translation with joint textual and phonetic embedding. arXiv preprint arXiv:1810.06729.
- Paul Michel and Graham Neubig. 2018. Mtnt: A testbed for machine translation of noisy text. arXiv preprint arXiv:1809.00388.
- Takeru Miyato, Andrew M Dai, and Ian Goodfellow. 2017. Adversarial training methods for semisupervised text classification. In International Conference on Learning Representations.
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.
- Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a methof for automatic evaluation of machine translation. In Association for Computational Linguistics.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Edinburgh neural machine translation systems for wmt 16. arXiv preprint arXiv:1606.02891.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Improving nerual machine translation models with monolingual data. In Association for Computational Linguistics.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016c. Neural machine translation of rare words with subword units. In Association for Computational Linguistics.
- Matthias Sperber, Jan Niehues, and Alex Waibel. 2017. Toward robust neural machine translation for noisy input sequences. In International Workshop on Spoken Language Translation.
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems.
- Christian Szegedy, Wojciech Zaremba, Sutskever Ilya, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Machine Learning.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.
- Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. Switchout: an efficient data augmentation algorithm for neural machine translation. In Empirical Methods in Natural Language Processing.
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
- Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In International Conference on Learning Representations.
Full Text
Tags
Comments