AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a novel adaptive segmentation policy for simultaneous translation

Learning Adaptive Segmentation Policy for Simultaneous Translation

EMNLP 2020, pp.2280-2289, (2020)

Cited by: 0|Views170
Full Text
Bibtex
Weibo

Abstract

Balancing accuracy and latency is a great challenge for simultaneous translation. To achieve high accuracy, the model usually needs to wait for more streaming text before translation, which results in increased latency. However, keeping low latency would probably hurt accuracy. Therefore, it is essential to segment the ASR output into app...More

Code:

Data:

0
Introduction
  • Simultaneous translation has attracted increasing interest both in research and industry community.
  • It aims at a real-time translation that demands high translation quality and an as-short-as-possible delay between speech and translation output.
  • The MT system takes sentences as input, while the streaming ASR output has no segmentation boundaries.
  • Exploring a policy to split ASR output into appropriate segments becomes a vital issue for simultaneous translation.
Highlights
  • In recent years, simultaneous translation has attracted increasing interest both in research and industry community
  • We propose a novel adaptive segmentation policy for simultaneous translation
  • Inspired by human interpreters, we propose a novel adaptive segmentation policy that splits the ASR output into meaning units for simultaneous translation
  • We propose a novel prefix-attention method to extract fine-grained Meaningful Unit (MU) by training a neural machine translation (NMT) model that generates monotonic translations
  • We aim to split the streaming text into MUs to get a trade-off between translation quality and latency
  • We model the MU segmentation as a classification problem and train a classifier, which receives a streaming text from ASR output and detects whether it constitutes an MU (Figure 1 (a) and (b))
Methods
  • To keep the MU as short as possible, the authors incrementally input the source text word-byword to an MT model and detect whether the translation yt of current source sequence is a prefix of the full-sentence translation y.
  • There are a lot of “SOV” structures, while English is an “SVO” language
  • In this case, both MU and MU++ should wait until a verb at the end of a sentence before generating an accurate translation.
  • They rated each translation in Bad, OK and Good based on the translations’ adequacy, correctness and fluency:
Results
  • Experimental results on NIST Chinese

    English and WMT 2015 German-English datasets show that the method outperforms the previous state-ofthe-art methods in balancing translation accuracy and latency.
  • Figure 5 shows the De-En translation results.
  • Its full-sentence translation model uses a bidirectional encoder, while the streaming model uses unidirectional encoders, resulting in the performance gap in its full-sentence model and streaming model.
  • Both models in the approaches, on the contrary, use the bidirectional encoder, avoiding such gaps
Conclusion
  • The authors propose a novel adaptive segmentation policy for simultaneous translation.
  • The authors first generate training data for MU via a translation-prefix based method, keeping consistency between the segmentation model and the translation model.
  • The authors propose a refined-method to extract fine-grained MUs to reduce latency.
  • Experimental results on both ChineseEnglish and German-English show that the model outperforms the previous state-of-the-art.
  • The method obtains a good trade-off between translation accuracy and latency and can be implemented into a practical simultaneous translation system
Summary
  • Introduction:

    Simultaneous translation has attracted increasing interest both in research and industry community.
  • It aims at a real-time translation that demands high translation quality and an as-short-as-possible delay between speech and translation output.
  • The MT system takes sentences as input, while the streaming ASR output has no segmentation boundaries.
  • Exploring a policy to split ASR output into appropriate segments becomes a vital issue for simultaneous translation.
  • Objectives:

    The authors aim to split the streaming text into MUs to get a trade-off between translation quality and latency.
  • The authors' goal is to find a segmentation SMU with appropriate granularity
  • Methods:

    To keep the MU as short as possible, the authors incrementally input the source text word-byword to an MT model and detect whether the translation yt of current source sequence is a prefix of the full-sentence translation y.
  • There are a lot of “SOV” structures, while English is an “SVO” language
  • In this case, both MU and MU++ should wait until a verb at the end of a sentence before generating an accurate translation.
  • They rated each translation in Bad, OK and Good based on the translations’ adequacy, correctness and fluency:
  • Results:

    Experimental results on NIST Chinese

    English and WMT 2015 German-English datasets show that the method outperforms the previous state-ofthe-art methods in balancing translation accuracy and latency.
  • Figure 5 shows the De-En translation results.
  • Its full-sentence translation model uses a bidirectional encoder, while the streaming model uses unidirectional encoders, resulting in the performance gap in its full-sentence model and streaming model.
  • Both models in the approaches, on the contrary, use the bidirectional encoder, avoiding such gaps
  • Conclusion:

    The authors propose a novel adaptive segmentation policy for simultaneous translation.
  • The authors first generate training data for MU via a translation-prefix based method, keeping consistency between the segmentation model and the translation model.
  • The authors propose a refined-method to extract fine-grained MUs to reduce latency.
  • Experimental results on both ChineseEnglish and German-English show that the model outperforms the previous state-of-the-art.
  • The method obtains a good trade-off between translation accuracy and latency and can be implemented into a practical simultaneous translation system
Tables
  • Table1: A comparison of Chinese-English text translation and simultaneous interpretation. A text translator translates the full sentence after reading all the source words and produces a translation with a long-distance reordering by moving the initial part (as underlined) of the source sentence to the end of the target side. But when doing simultaneous interpreting, an interpreter starts to translate as soon as he or she judges that the current received streaming text constitutes an MU (“||”) and translate them monotonically
  • Table2: The training samples for the MU detection model generated according to the MU segmentation result in Figure 2
  • Table3: The human evaluation of the Zh-En and De-En translation on 200 sentences with δ = 0.7
Download tables as Excel
Related work
  • Recent simultaneous translation work focuses on exploring a policy to decide whether to wait for another source word or generate a target word. Rangarajan Sridhar et al (2013) investigated a variety of policies depending on lexical cues. Oda et al (2014) proposed to optimize a segmentation model with the target of achieving better translation quality. However, their performance is limited largely by weak features such as N-gram and POS. Some research learns the policy depending on reinforcement learning, with the goal of good translation quality and low latency (Grissom II et al, 2014; Satija and Pineau, 2016; Gu et al, 2017; Alinejad et al, 2018). But reinforcement learning is notorious for its unstable training process. Cho and Esipova (2016) proposed a heuristic measure to determine the policy at inference time, without using a deep model. Ma et al (2019) and Dalvi et al (2018) applied fixed policy independent of contextual information, which inevitably need to guess the future context in translation (Zheng et al, 2019a). Some work applied advanced attention mechanisms that replace the softmax attention with a stepwise Bernoulli selection probability (Raffel et al, 2017). Arivazhagan et al (2019) proposed infinite lookback to integrate the hard monotonic attention with soft attention. Ma et al (2020) proposed multi-head monotonic attention and obtained further improvements. However, the autoregressive training process makes its exploration inefficient.
Funding
  • Experimental results on NIST Chinese-English and WMT 2015 German-English datasets show that our method outperforms the previous state-ofthe-art methods in balancing translation accuracy and latency
Reference
  • Ashkan Alinejad, Maryam Siahbani, and Anoop Sarkar. 2018. Prediction improves simultaneous neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3022–3027.
    Google ScholarLocate open access versionFindings
  • Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, and Colin Raffel. 2019.
    Google ScholarFindings
  • Monotonic infinite lookback attention for simultaneous machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1313–1323, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho and Masha Esipova. 2016. Can neural machine translation do simultaneous translation? arXiv preprint arXiv:1606.02012.
    Findings
  • Fahim Dalvi, Nadir Durrani, Hassan Sajjad, and Stephan Vogel. 2018. Incremental decoding and training methods for simultaneous translation in neural machine translation. arXiv preprint arXiv:1806.03661.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III. 2014. Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation. In Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pages 1342–1352.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Graham Neubig, Kyunghyun Cho, and Victor OK Li. 2017. Learning to translate in real-time with neural machine translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1053–1062.
    Google ScholarLocate open access versionFindings
  • Mingbo Ma, Liang Huang, Hao Xiong, Kaibo Liu, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, and Haifeng Wang. 201STACL: simultaneous translation with integrated anticipation and controllable latency. In ACL 2019, volume abs/1810.08398.
    Google ScholarLocate open access versionFindings
  • Xutai Ma, Juan Pino, James Cross, Liezl Puzon, and Jiatao Gu. 2020. Monotonic multihead attention. In ICLR 2020.
    Google ScholarFindings
  • Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2014. Optimizing segmentation strategies for simultaneous speech translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 551–556.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Colin Raffel, Minh-Thang Luong, Peter J Liu, Ron J Weiss, and Douglas Eck. 2017. Online and lineartime attention by enforcing monotonic alignments. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2837–2846. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Andrej Ljolje, and Rathinavelu Chengalvarayan. 2013. Segmentation strategies for streaming speech translation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 230–238, Atlanta, Georgia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Harsh Satija and Joelle Pineau. 2016. Simultaneous machine translation using deep reinforcement learning. In Proceedings of the Workshops of International Conference on Machine Learning, New York, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 20Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1715–1725.
    Google ScholarLocate open access versionFindings
  • Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Andrej Ljolje, and Rathinavelu Chengalvarayan. 2013. Segmentation strategies for streaming speech translation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 230–238.
    Google ScholarLocate open access versionFindings
  • Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010.
    Google ScholarLocate open access versionFindings
  • Baigong Zheng, Kaibo Liu, Renjie Zheng, Mingbo Ma, Hairong Liu, and Liang Huang. 20Simultaneous translation policies: From fixed to adaptive. arXiv preprint arXiv:2004.13169.
    Findings
  • Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019a. Simpler and faster learning of adaptive policies for simultaneous translation. In Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019b. Simultaneous translation with flexible policy via restricted imitation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5816– 5822, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Author
Ruiqing Zhang
Ruiqing Zhang
Chuanqiang Zhang
Chuanqiang Zhang
Your rating :
0

 

Tags
Comments
小科