AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
By extensively comparing the single models, 1-1 model, and M2 in varying conditions, we find that the M2 can benefit from multi-way training through data-diversification and regularization while suffering less from capacity bottlenecks

Revisiting Modularized Multilingual NMT to Meet Industrial Demands

EMNLP 2020, pp.5905-5918, (2020)

Cited by: 0|Views196
Full Text
Bibtex
Weibo

Abstract

The complete sharing of parameters for multilingual translation (1-1) has been the mainstream approach in current research. However, degraded performance due to the capacity bottleneck and low maintainability hinders its extensive adoption in industries. In this study, we revisit the multilingual neural machine translation model that only...More

Code:

Data:

0
Introduction
  • With the current increase in the demand for neural machine translation (NMT), serving an increasing number of languages poses a practical problem for the industry.
  • A more practical approach is to limit the number of models by sharing the components among the models (Dong et al, 2015; Firat et al, 2016a; Ha et al.; Johnson et al, 2017).
  • A fully shared model, which only uses one encoder and one decoder to translate all directions (Ha et al.; Johnson et al., 2017), has been the most popular method because of its compactness.
  • Zhang et al (2020) explicitly identified the capacity bottleneck problem of the 1-1 model by showing a clear decrease in performance when translation directions are doubled
Highlights
  • Our findings suggest that the M2 can be a competent candidate for multilingual translation in industries
  • With the current increase in the demand for neural machine translation (NMT), serving an increasing number of languages poses a practical problem for the industry
  • NMT is to have multiple single-directional models, which is unsustainable owing to the quadratic increase of models as more languages are introduced
  • To resolve the capacity bottleneck problem while enjoying the benefits, we identify the effects of multi-way training in a carefully controlled environment
  • By extensively comparing the single models, 1-1 model, and M2 in varying conditions, we find that the M2 can benefit from multi-way training through data-diversification and regularization while suffering less from capacity bottlenecks
Methods
  • Methods in Natural Language Processing

    System

    Demonstrations, pages 66–71.

    Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and

    Haifeng Wang. 2015.
  • Methods in Natural Language Processing.
  • Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and.
  • Haifeng Wang.
  • Multi-task learning for multiple language translation.
  • In Proceedings of the.
  • 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint.
  • Conference on Natural Language Processing (Volume 1: Long Papers), pages 1723–1732.
  • Carlos Escolano, Marta R Costa-jussà, and José AR.
  • From bilingual to multilingual neural machine translation by incremental training.
Conclusion
  • The authors re-evaluate the M2 model and suggest it as an appropriate choice for multilingual translation in industries.
  • By extensively comparing the single models, 1-1 model, and M2 in varying conditions, the authors find that the M2 can benefit from multi-way training through data-diversification and regularization while suffering less from capacity bottlenecks.
  • The authors suggest that the M2 model is maintainable because of its interlingual space.
  • The interlingual space enables incremental training in a simple manner, and accompanies competitive incremental zero-shot performance.
  • The authors hope that this study sheds light on the relatively disregarded M2 model and provide a benchmark for selecting a model among varying levels of shared components
Tables
  • Table1: SacreBLEU test scores of single models, 11, and M2 trained using a completely balanced, nonsharing dataset. Values in parentheses indicate the performance difference from single models
  • Table2: Averaged SacreBLEU test scores of single models, 1-1, and M2 trained using a balanced dataset of different configurations. M2M indicates the training of full many-to-many directions among languages (12 directions), whereas JM2M represents the training of directions that only include English on one side(6 directions). ∗ indicates that the score is averaged only on English-centric
  • Table3: Test SacreBLEU test scores of single models, 1-1 model, and M2 trained using an unbalanced, completely non-sharing dataset. 1:1:1, 1:2:4, and 1:5:25 represent the ratios of the low, medium, and high resource pairs, respectively. Values in parentheses indicate the performance difference from single models in respective environments
  • Table4: Averaged test SacreBLEU scores of 1-1 and
  • Table5: SacreBLEU test scores of a single model and incremented modules of the M2. Values in parentheses indicate the number of languages involved in the
  • Table6: SacreBLEU zero-shot test scores of the English-pivoted single models and incremented modules from
  • Table7: Cosine text similarity score of encoder outputs and SacreBLEU score of mono-direction translation(EnEn). Values in parentheses indicate the number of languages involved in the M2 (3: De, En, Nl; 4: 3 + Es; 5: 4 +
  • Table8: Division of multi-parallel parts for each pair in section 4
  • Table9: Division of multi-parallel parts for each pair in section 5
  • Table10: The amount of data for each pair in section 4 parts(500K) for section 4 and 5 parts(250K) for section 5. And then, we assigned the parts to pairs so that no two directions of the same side share the same part. The assignment for section 4 and
  • Table11: Detailed scores of 2 and 3 in table 2
  • Table12: Detailed scores of 4 and 5 in table 2
  • Table13: Detailed scores of models of table 4. M2(+10) indicates the selected best model trained with addtional
  • Table14: Detailed scores of the models in 7
Download tables as Excel
Related work
  • Neural machine translation

    The most popular framework for NMT is the encoder-decoder model (Cho et al, 2014; Sutskever et al, 2014; Bahdanau et al, 2014; Luong et al., 2015; Vaswani et al, 2017). Adopting attention module greatly improved the performance of encoder-decoder model by using context vector instead of fixed length vector (Bahdanau et al, 2014; Luong et al, 2015). By exploiting multiple attentive heads, the Transformer model has become the de-facto standard model in NMT (Vaswani et al., 2017; Ott et al, 2018; So et al, 2019).

    Multilingual neural machine translation

    Dabre et al (2019) categorized the architectures of multilingual NMTs according to their degrees of parameter sharing. We briefly introduce the models under their criteria.

    Early multilingual NMT models minimally shared the parameters by sharing language-specific encoder (Dong et al, 2015; Lee et al, 2017) or decoder (Zoph and Knight, 2016). Firat et al (2016a) extended this to sharing both language-specific encoders and decoders with a shared attention module.
Study subjects and analysis
pairs: 3
We also distinguish between two different dataset compositions: the sharing case where all language pairs share the same sentence set, and the non-sharing case where there is no overlap between different pairs. To illustrate, a multiparallel set ‘En - Es - Ko’ can be shared for all possible three pairs (En - Es, En - Ko, Es - Ko) or used only once for one pair. Considering that multiparallel data is rare in practice, we compared the models in a strictly non-sharing environment

pairs: 250000
2) Train the module with. Table 5. ∗ means that the model is trained using the supervision of 250 thousand pairs. auxiliary directions

pairs: 3
Table 7 shows the cosine similarity and monodirection translation scores of the M2. As the M2 trains using more languages, the cosine similarity of all three pairs increases, which implies higher language invariance in interlingual space. However, the gain from marginal languages decreases as the number of languages increases

Reference
  • Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019.
    Google ScholarFindings
  • Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North
    Google ScholarLocate open access versionFindings
  • American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3874–
    Google ScholarFindings
  • 388Maruan Al-Shedivat and Ankur Parikh. 2019. Consistency by agreement in zero-shot neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1184–1197. Macherey. 2019a. The missing ingredient in zeroshot neural machine translation. arXiv preprint arXiv:1903.07091.
    Findings
  • Cherry, et al. 2019b. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Findings
  • Ankur Bapna and Orhan Firat. 2019.
    Google ScholarFindings
  • Simple, scalable adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language
    Google ScholarLocate open access versionFindings
  • Ward. 2018. Multilingual neural machine translation with task-specific attention. In Proceedings of the 27th International Conference on Computational
    Google ScholarLocate open access versionFindings
  • machine translation. In Proceedings of the 2016
    Google ScholarLocate open access versionFindings
  • Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation.
    Google ScholarFindings
  • Transactions of the Association for Computational
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A
    Google ScholarFindings
  • Schwenk, and Yoshua Bengio. 20Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–
    Google ScholarLocate open access versionFindings
  • 1734. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5, pages 79–86. Citeseer.
    Google ScholarLocate open access versionFindings
  • 2019. A survey of multilingual neural machine translation. arXiv preprint arXiv:1905.05395. Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical
    Findings
  • Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint
    Google ScholarFindings
  • Conference on Natural Language Processing (Volume 1: Long Papers), pages 1723–1732.
    Google ScholarLocate open access versionFindings
  • Fonollosa. 2019. From bilingual to multilingual neural machine translation by incremental training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 236–242.
    Google ScholarLocate open access versionFindings
  • Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual
    Google ScholarLocate open access versionFindings
  • Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 66–75.
    Google ScholarFindings
  • 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics, 5:365–378. Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. arXiv preprint arXiv:2001.08210.
    Findings
  • Carlos Escolano, Marta R Costa-jussà, José AR Fonollosa, and Mikel Artetxe. 2020. Multilingual machine translation: Closing the gap between shared and language-specific encoder-decoders.
    Google ScholarFindings
  • Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, and Jason Sun. 2018. A neural interlingua for multilingual machine translation. In Proceedings of the Third Conference on Machine
    Google ScholarLocate open access versionFindings
  • 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North pages 866–875.
    Google ScholarLocate open access versionFindings
  • Manning. 2015. Effective approaches to attentionbased neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421.
    Google ScholarLocate open access versionFindings
  • Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T Yarman Vural, and Kyunghyun Cho. 2016b.
    Google ScholarFindings
  • Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages
    Google ScholarLocate open access versionFindings
  • Toan Q Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 296–301.
    Google ScholarLocate open access versionFindings
  • Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53.
    Google ScholarLocate open access versionFindings
  • Auli. 2018. Scaling neural machine translation. In Proceedings of the Third Conference on Machine
    Google ScholarLocate open access versionFindings
  • Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. Contextual parameter generation for universal neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language
    Google ScholarLocate open access versionFindings
  • Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on
    Google ScholarLocate open access versionFindings
  • 191. Devendra Sachan and Graham Neubig. 2018. Parameter sharing methods for multilingual self-attentional translation models. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 261–271. Holger Schwenk and Matthijs Douze. 2017. Learning joint multilingual sentence representations with neural machine translation. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • Pushpak Bhattacharyya. 2019. Multilingual unsupervised nmt using shared encoder and languagespecific decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational
    Google ScholarLocate open access versionFindings
  • David So, Quoc Le, and Chen Liang. 2019. The evolved transformer. In International Conference on
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.
    Google ScholarFindings
  • Tie-Yan Liu. 2019a. Multilingual neural machine translation with language clustering. In Proceedings of the 2019 Conference on Empirical Methods in
    Google ScholarLocate open access versionFindings
Author
Sungwon Lyu
Sungwon Lyu
Bokyung Son
Bokyung Son
Kichang Yang
Kichang Yang
Jaekyoung Bae
Jaekyoung Bae
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科