Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

W-NUT@EMNLP, pp. 42-47, 2019.

Cited by: 15|Bibtex|Views65|DOI:https://doi.org/10.18653/v1/D19-5506
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We find that by feeding our translation models a balanced diet of several types of synthetic noise at training time, it is possible to obtain substantial improvements on such naturally noisy data, with minimal impact on the performance on clean data, and without accessing the tes...

Abstract:

We consider the problem of making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. We show how train...More

Code:

Data:

Introduction
  • Machine translation systems are generally trained on clean data, without spelling errors.
  • Systems trained on clean data generally perform poorly when faced with such errors at test time (Heigold et al, 2017; Belinkov and Bisk, 2018).
  • Using synthetic noise at training time has been found to improve performance only on test data with exactly the same kind of synthetic noise, while at the same time impairing performance on clean test data (Heigold et al, 2017; Belinkov and Bisk, 2018).
  • The authors desire methods that perform well on both clean text and naturally-occurring noise, but this is beyond the current state of the art
Highlights
  • Machine translation systems are generally trained on clean data, without spelling errors
  • Drawing inspiration from dropout and noisebased regularization methods, we explore the space of random noising methods at training time, and evaluate performance on both clean text and text corrupted by “natural noise” found in real spelling errors
  • We find that by feeding our translation models a balanced diet of several types of synthetic noise at training time, it is possible to obtain substantial improvements on such naturally noisy data, with minimal impact on the performance on clean data, and without accessing the test noise data or even its distribution
  • We show how training on synthetic character-level noise, similar in spirit to dropout, can significantly improve a translation model’s robustness to natural spelling mistakes
  • We conjecture that spelling mistakes constitute a small part of the deviations from standard text, and that the main challenges in this domain stem from other linguistic phenomena
Results
  • Table 2 shows the model’s performance on data with varying amounts of natural errors.
  • As observed in prior art (Heigold et al, 2017; Belinkov and Bisk, 2018), when there are significant amounts of natural noise, the model’s performance drops significantly.
  • Training on the synthetic noise cocktail greatly improves performance, regaining between 19% and 54% of the.
  • Dataset de-en de-en de-en fr-en fr-en fr-en cs-en cs-en cs-en.
  • + Synthetic Noise ∆ %Recovered Training Noise.
Conclusion
  • This work takes a step towards making machine translation robust to character-level noise.
  • The authors show how training on synthetic character-level noise, similar in spirit to dropout, can significantly improve a translation model’s robustness to natural spelling mistakes.
  • While the method works well on misspellings, it does not appear to generalize to non-standard text in social media.
  • The authors conjecture that spelling mistakes constitute a small part of the deviations from standard text, and that the main challenges in this domain stem from other linguistic phenomena
Summary
  • Introduction:

    Machine translation systems are generally trained on clean data, without spelling errors.
  • Systems trained on clean data generally perform poorly when faced with such errors at test time (Heigold et al, 2017; Belinkov and Bisk, 2018).
  • Using synthetic noise at training time has been found to improve performance only on test data with exactly the same kind of synthetic noise, while at the same time impairing performance on clean test data (Heigold et al, 2017; Belinkov and Bisk, 2018).
  • The authors desire methods that perform well on both clean text and naturally-occurring noise, but this is beyond the current state of the art
  • Results:

    Table 2 shows the model’s performance on data with varying amounts of natural errors.
  • As observed in prior art (Heigold et al, 2017; Belinkov and Bisk, 2018), when there are significant amounts of natural noise, the model’s performance drops significantly.
  • Training on the synthetic noise cocktail greatly improves performance, regaining between 19% and 54% of the.
  • Dataset de-en de-en de-en fr-en fr-en fr-en cs-en cs-en cs-en.
  • + Synthetic Noise ∆ %Recovered Training Noise.
  • Conclusion:

    This work takes a step towards making machine translation robust to character-level noise.
  • The authors show how training on synthetic character-level noise, similar in spirit to dropout, can significantly improve a translation model’s robustness to natural spelling mistakes.
  • While the method works well on misspellings, it does not appear to generalize to non-standard text in social media.
  • The authors conjecture that spelling mistakes constitute a small part of the deviations from standard text, and that the main challenges in this domain stem from other linguistic phenomena
Tables
  • Table1: The synthetic noise types applied during training. Noise is applied on a random character, selected from a uniform distribution. The right column illustrates the application of each noise type on the word “whale.”
  • Table2: Performance on the IWSLT 2016 translation task with varying rates of natural noise in the test set. Noise Probability is the probability of attempting to apply natural noise to a test token, while Noised Tokens is the fraction of tokens that were noised in practice; not every word in the vocabulary has a corresponding misspelling
  • Table3: Performance on IWSLT 2016 de-en test with maximal natural noise when training with one noise type (top) and three noise types (bottom)
  • Table4: The proportion of natural errors caused by deleting/inserting/substituting a single character or swapping two adjacent characters
  • Table5: The performance of a machine translation model on the MTNT task
Download tables as Excel
Related work
  • The use of noise to improve robustness in machine learning has a long history (e.g., Holmstrom and Koistinen, 1992; Wager et al, 2013), with early work by Bishop (1995) demonstrating a connection between additive noise and regularization. To achieve robustness to orthographical errors, we require noise that operates at the character level. Heigold et al (2017) demonstrated that synthetic noising operations such as random swaps and replacements can degrade performance when inserted at test time; they also show that some robustness can be obtained by inserting the same noise at training time. Similarly, Sperber et al (2017) explore the impact of speech-like noise.

    Most relevant for us is the work of Belinkov and Bisk (2018), who evaluated on natural noise obtained from Wikipedia edit histories (e.g., Max and Wisniewski, 2010). They find that robustness to natural noise can be obtained by training on the same noise model, but that (a) training on synthetic noise does not yield robustness to natural noise at test time, and (b) training on natural noise significantly impairs performance on clean text. In contrast, we show that training on the right blend of synthetic noise can yield substantial improvements on natural noise at test time, without significantly impairing performance on clean data. Our ablation results suggest that deletion and insertion noise (not included by Belinkov and Bisk) are essential to achieving robustness to natural noise.
Reference
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In ICLR.
    Google ScholarFindings
  • Chris M Bishop. 1995. Training with noise is equivalent to tikhonov regularization. Neural computation, 7(1):108–116.
    Google ScholarLocate open access versionFindings
  • Mauro Cettolo, Niehues Jan, Stuker Sebastian, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2016. The IWSLT 2016 evaluation campaign. In International Workshop on Spoken Language Translation.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. Towards robust neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756– 1766, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. Hotflip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 conference of the North American Chapter of the association for computational linguistics: Human language technologies, pages 359–369.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
    Findings
  • Georg Heigold, Gunter Neumann, and Josef van Genabith. 2017. How robust are character-based word embeddings in tagging and mt against wrod scramlbing or randdm nouse? arXiv preprint arXiv:1704.04441.
    Findings
  • Paul Michel and Graham Neubig. 2018. Mtnt: A testbed for machine translation of noisy text. In EMNLP.
    Google ScholarFindings
  • Luz Rello and Ricardo A Baeza-Yates. 2012. Social media is not that bad! the lexical quality of social media. In ICWSM.
    Google ScholarFindings
  • Keisuke Sakaguchi, Kevin Duh, Matt Post, and Benjamin Van Durme. 2017. Robsut wrod reocginiton via semi-character recurrent neural network. In AAAI, pages 3281–3287.
    Google ScholarLocate open access versionFindings
  • Karel Sebesta, Zuzanna Bedrichova, Katerina Sormova, Barbora Stindlova, Milan Hrdlicka, Tereza Hrdlickova, Jirı Hana, Vladimır Petkevic, Tomas Jelınek, Svatava Skodova, Petr Janes, Katerina Lundakova, Hana Skoumalova, Simon Sladek, Piotr Pierscieniak, Dagmar Toufarova, Milan Straka, Alexandr Rosen, Jakub Naplava, and Marie Polackova. 2017. CzeSL grammatical error correction dataset (CzeSL-GEC). LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (U FAL), Faculty of Mathematics and Physics, Charles University.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
    Findings
  • Matthias Sperber, Jan Niehues, and Alex Waibel. 2017. Toward robust neural machine translation for noisy input sequences. In International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan.
    Google ScholarLocate open access versionFindings
  • Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving robustness of machine translation with synthetic noise. arXiv preprint arXiv:1902.09508.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 20Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Stefan Wager, Sida Wang, and Percy S Liang. 2013. Dropout training as adaptive regularization. In Advances in neural information processing systems, pages 351–359.
    Google ScholarLocate open access versionFindings
  • Torsten Zesch. 2012. Measuring contextual fitness using error contexts extracted from the wikipedia revision history. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 529–538. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments