We presented Big BIRD, an adaptation of the Insertion Transformer to documentlevel translation
Big Bidirectional Insertion Representations for Documents.
NGT@EMNLP-IJCNLP, pp.194-198, (2019)
The Insertion Transformer is well suited for long form text generation due to its parallel generation capabilities, requiring $O(\log_2 n)$ generation steps to generate $n$ tokens. However, modeling long sequences is difficult, as there is more ambiguity captured in the attention mechanism. This work proposes the Big Bidirectional Inser...更多
下载 PDF 全文
- Insertion-based models (Stern et al, 2019; Welleck et al, 2019; Gu et al, 2019; Chan et al, 2019) have been introduced for text generation.
- An autoregressive left-to-right model would require O(n) generation steps to generate n tokens, whereas the Insertion Transformer (Stern et al, 2019) and KERMIT (Chan et al, 2019) following a balanced binary tree policy requires only O(log2 n) generation steps to generate n tokens
- This is especially important for longform text generation, for example, DocumentLevel Machine Translation.
- There are two primary methods to include context in a documentlevel machine translation model compared to a sentence-level translation model
- Insertion-based models (Stern et al, 2019; Welleck et al, 2019; Gu et al, 2019; Chan et al, 2019) have been introduced for text generation
- We present Big Bidirectional Insertion Representations for Documents (Big BIRD)
- The Big BIRD model is as described in Section 2, and the baseline Insertion Transformer model has exactly the same configurations except without sentence-positional embeddings
- We presented Big BIRD, an adaptation of the Insertion Transformer to documentlevel translation
- In addition to a large context window, Big BIRD uses sentence-positional embeddings to directly capture sentence alignment between source and target documents. We show both quantitatively and qualitatively the promise of Big BIRD, with a +4.3 BLEU improvement over the baseline model and examples where Big BIRD achieves better translation quality via sentence alignment
- We believe Big BIRD is a promising direction for document level understanding and generation
- The authors experiment with the WMT’19 English→German document-level translation task (Barrault et al, 2019).
- The training dataset consists of parallel document-level data (Eu-.
- Roparl, Rapid, News-Commentary) and parallel sentence-level data (WikiTitles, Common Crawl, Paracrawl).
- The authors' baseline Insertion Transformer model is given the prior knowledge of number of source sentences in the document.
- All models were trained with the SM3 optimizer (Anil et al, 2019) with momentum 0.9, learning rate 0.1, and a quadratic learning rate warm-up schedule with 10k warm-up steps.
- The authors presented Big BIRD, an adaptation of the Insertion Transformer to documentlevel translation.
- In addition to a large context window, Big BIRD uses sentence-positional embeddings to directly capture sentence alignment between source and target documents.
- The authors believe Big BIRD is a promising direction for document level understanding and generation
- Table1: WMT19 English→German Document-Level Translation
- Table2: An example where the Insertion Transformer gets confused with sentence alignment: it maps one sentence from the source into two sentences in the translation and loses semantic accuracy. When given sentence alignment explicitly, i.e. Big BIRD, it translates the sentence coherently
- Rohan Anil, Vineet Gupta, Tomer Koren, and Yoram Singer. 2019. Memory-Efficient Adaptive Optimization for Large-Scale Learning. In arXiv.
- Loc Barrault, Ondej Bojar, Marta R. Costa-juss, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Mller, Santanu Pal, Matt Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation. In ACL.
- William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, and Jakob Uszkoreit. 2019. KERMIT: Generative Insertion-Based Modeling for Sequences. In arXiv.
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 201Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation. In EMNLP.
- Jiatao Gu, Qi Liu, and Kyunghyun Cho. 2019. Insertion-based Decoding with Automatically Inferred Generation Order. In arXiv.
- Hany Hassan, Anthony Aue andChang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. In arXiv.
- Marcin Junczys-Dowmunt. 2019. Microsoft Translator at WMT 2019: Towards Large-Scale DocumentLevel Neural Machine Translation. In WMT.
- Taku Kudo and John Richardson. 201Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71.
- Samuel Lubli, Rico Sennrich, and Martin Volk. 2018. Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation. In EMNLP.
- Sameen Maruf and Gholamreza Haffari. 2018. Document Context Neural Machine Translation with Memory Networks. In ACL.
- Matt Post. 2018. A Call for Clarity in Reporting BLEU Scores. In WMT.
- Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. 2019. Insertion Transformer: Flexible Sequence Generation via Insertion Operations. In ICML.
- Ilya Sutskever, Oriol Vinyals, and Quoc Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS.
- Sean Welleck, Kiante Brantley, Hal Daume, and Kyunghyun Cho. 2019. Non-Monotonic Sequential Text Generation. In ICML.