AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We investigated a broad array of generation orders for machine translation using an insertion-based sequence generation model, the Insertion Transformer

An Empirical Study of Generation Order for Machine Translation

EMNLP 2020, pp.5764-5773, (2020)

Cited by: 7|Views122
Full Text
Bibtex
Weibo

Abstract

In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of ...More

Code:

Data:

0
Introduction
  • Neural sequence models (Sutskever et al, 2014; Cho et al, 2014) have been successfully applied to a broad range of tasks in recent years.
  • It leaves open the possibility that a non-left-to-right factorization of the joint distribution over output sequences could outperform the usual monotonic ordering
  • To address these concerns, several recent approaches have been proposed for insertion-based sequence modeling, in which sequences are constructed by repeatedly inserting tokens at arbitrary locations in the output rather than only at the right-most position.
  • The authors give a brief overview of the model before moving on to the details of the investigation
Highlights
  • Neural sequence models (Sutskever et al, 2014; Cho et al, 2014) have been successfully applied to a broad range of tasks in recent years
  • Several recent approaches have been proposed for insertion-based sequence modeling, in which sequences are constructed by repeatedly inserting tokens at arbitrary locations in the output rather than only at the right-most position
  • We describe a wide variety of generation orders which can be characterized by different order functions O(a) in the subsections that follow
  • In our model adaptive easy-order, we were unable to identify any strong patterns in the generation order
  • We investigated a broad array of generation orders for machine translation using an insertion-based sequence generation model, the Insertion Transformer
  • We found that regardless of the type of strategy selected, be it locationbased, frequency-based, length-based, alphabetical, model-based, or even random, the Insertion Transformer is able to learn it with high fidelity and produce high-quality output in the selected order
Methods
  • The authors train and evaluate models for each order on two standard machine translation datasets: WMT14 En-De and WMT18 EnZh.
  • En-Zh evaluation is carried out using sacreBLEU2 (Post, 2018)
  • In both cases, the authors train all models for 1M steps using sequencelevel knowledge distillation (Hinton et al, 2015; Kim and Rush, 2016) from a base Transformer (Vaswani et al, 2017).
  • 8} (Stern et al, 2019) on the development set, but otherwise perform no additional hyperparameter tuning, borrowing all other model and optimization settings from the base Transformer
  • The authors perform a sweep over temperatures τ ∈ {0.5, 1, 2} and EOS penalties ∈ {0, 0.5, 1, 1.5, . . . , 8} (Stern et al, 2019) on the development set, but otherwise perform no additional hyperparameter tuning, borrowing all other model and optimization settings from the base Transformer
Results
  • The authors measure the quality of the models by evaluating their performance on their respective test sets.
  • The authors note that while the soft left-to-right and right-to-left losses perform substantially better than the hard loss employed in the original work by Stern et al (2019), performance does suffer when using parallel decoding for those models, which is generally untrue of the other orderings
  • The authors believe this is due in part to exposure bias issues arising from the monotonic ordering as
Conclusion
  • The authors investigated a broad array of generation orders for machine translation using an insertion-based sequence generation model, the Insertion Transformer.
  • The authors found that regardless of the type of strategy selected, be it locationbased, frequency-based, length-based, alphabetical, model-based, or even random, the Insertion Transformer is able to learn it with high fidelity and produce high-quality output in the selected order
  • This is especially true for English-German single sentence translation, in which the authors by and large found order to not matter.
  • This opens a wide range of possibilities for generation tasks where monotonic orderings are not the most natural choice, and the authors would be excited to explore some of these areas in future work
Summary
  • Introduction:

    Neural sequence models (Sutskever et al, 2014; Cho et al, 2014) have been successfully applied to a broad range of tasks in recent years.
  • It leaves open the possibility that a non-left-to-right factorization of the joint distribution over output sequences could outperform the usual monotonic ordering
  • To address these concerns, several recent approaches have been proposed for insertion-based sequence modeling, in which sequences are constructed by repeatedly inserting tokens at arbitrary locations in the output rather than only at the right-most position.
  • The authors give a brief overview of the model before moving on to the details of the investigation
  • Methods:

    The authors train and evaluate models for each order on two standard machine translation datasets: WMT14 En-De and WMT18 EnZh.
  • En-Zh evaluation is carried out using sacreBLEU2 (Post, 2018)
  • In both cases, the authors train all models for 1M steps using sequencelevel knowledge distillation (Hinton et al, 2015; Kim and Rush, 2016) from a base Transformer (Vaswani et al, 2017).
  • 8} (Stern et al, 2019) on the development set, but otherwise perform no additional hyperparameter tuning, borrowing all other model and optimization settings from the base Transformer
  • The authors perform a sweep over temperatures τ ∈ {0.5, 1, 2} and EOS penalties ∈ {0, 0.5, 1, 1.5, . . . , 8} (Stern et al, 2019) on the development set, but otherwise perform no additional hyperparameter tuning, borrowing all other model and optimization settings from the base Transformer
  • Results:

    The authors measure the quality of the models by evaluating their performance on their respective test sets.
  • The authors note that while the soft left-to-right and right-to-left losses perform substantially better than the hard loss employed in the original work by Stern et al (2019), performance does suffer when using parallel decoding for those models, which is generally untrue of the other orderings
  • The authors believe this is due in part to exposure bias issues arising from the monotonic ordering as
  • Conclusion:

    The authors investigated a broad array of generation orders for machine translation using an insertion-based sequence generation model, the Insertion Transformer.
  • The authors found that regardless of the type of strategy selected, be it locationbased, frequency-based, length-based, alphabetical, model-based, or even random, the Insertion Transformer is able to learn it with high fidelity and produce high-quality output in the selected order
  • This is especially true for English-German single sentence translation, in which the authors by and large found order to not matter.
  • This opens a wide range of possibilities for generation tasks where monotonic orderings are not the most natural choice, and the authors would be excited to explore some of these areas in future work
Tables
  • Table1: Order functions for an action a corresponding to the insertion of word w into slot s within span (i, j). The rank terms are computed with respect to the set of words from the valid action set A∗
  • Table2: Percentage of insertions that follow the target order exactly, averaged over the development set
  • Table3: Test BLEU results for WMT14 En-De newstest2014 and WMT18 En-Zh newstest2018 with serial and parallel decoding
  • Table4: Development BLEU results for WMT14 En-De newstest2013 and WMT18 En-Zh newstest2017. The first number in each column is the result obtained without an EOS penalty, while the second number in parentheses is the score obtained with the best EOS penalty for that setting
Download tables as Excel
Related work
  • In recent work, several insertion-based frameworks have been proposed for the genera- English-German English-Chinese Uniform Binary Tree Random Common First

    Rare First Shortest First Longest First A -> z z -> A

    Easy First Hard First
Reference
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation. In EMNLP.
    Google ScholarFindings
  • Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, and George E. Dahl. 2018. The Importance of Generation Order in Language Modeling. In EMNLP.
    Google ScholarFindings
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. NonAutoregressive Neural Machine Translation. In ICLR.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Qi Liu, and Kyunghyun Cho. 2019. Insertion-based Decoding with Automatically Inferred Generation Order. In arXiv.
    Google ScholarFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. 201Distilling the Knowledge in a Neural Network. In NIPS Deep Learning and Representation Learning Workshop.
    Google ScholarFindings
  • Yoon Kim and Alexander M. Rush. 201SequenceLevel Knowledge Distillation. In EMNLP.
    Google ScholarFindings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement. In EMNLP.
    Google ScholarFindings
  • Matt Post. 201A Call for Clarity in Reporting BLEU Scores. In WMT.
    Google ScholarFindings
  • Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. 201Insertion Transformer: Flexible Sequence Generation via Insertion Operations. In ICML.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2015. Order Matters: Sequence to sequence for sets. In ICLR.
    Google ScholarFindings
  • Sean Welleck, Kiante Brantley, Hal Daume, and Kyunghyun Cho. 2019. Non-Monotonic Sequential Text Generation. In ICML.
    Google ScholarFindings
Author
Your rating :
0

 

Tags
Comments
小科