Simple and Effective Retrieve-Edit-Rerank Text Generation

ACL, pp. 2532-2538, 2020.

Cited by: 0|Bibtex|Views74
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
The machine translation results in Table 1 show that for both ENNL and English to Hungarian, the Transformer without retrieval slightly outperforms the LSTM based Neural Fuzzy Repair which includes retrieval

Abstract:

Retrieve-and-edit seq2seq methods typically retrieve an output from the training set and learn a model to edit it to produce the final output. We propose to extend this framework with a simple and effective post-generation ranking approach. Our framework (i) retrieves several potentially relevant outputs for each input, (ii) edits each ca...More

Code:

Data:

0
Introduction
  • Retrieve-and-edit text generation methods have received significant recent interest; editing humanauthored text can potentially avoid many of the challenges that are seen while generating text from scratch, including the tendency to be overly repetitive or to degrade on longer texts (Holtzman et al, 2018, 2019).
  • Retrieve-and-edit methods have been developed for summarization (Cao et al, 2018), machine translation (Wu et al, 2019), language modeling (Guu et al, 2018), and conversation generation (Weston et al, 2018)
  • These methods first retrieve a single output from the training set, and use a learned model to edit it into the final output.
  • The authors show that gains are possible and that it helps to see what edits are made for multiple candidates before making the final decision, instead of following previous work by trying to select a single candidate before editing
Highlights
  • Retrieve-and-edit text generation methods have received significant recent interest; editing humanauthored text can potentially avoid many of the challenges that are seen while generating text from scratch, including the tendency to be overly repetitive or to degrade on longer texts (Holtzman et al, 2018, 2019)
  • We show that generation performance can be improved with a retrieve-edit-rerank approach that instead retrieves a set of outputs from the training set, edits each independently, and re-ranks the results to produce the final output
  • We evaluate performance on the Gigaword summarization dataset (Rush et al, 2015) and on the English to Dutch (EN-NL) and the English to Hungarian (EN-HU) machine translation (MT) tasks, following Bulte and Tezcan (2019)
  • The machine translation results in Table 1 show that for both ENNL and English to Hungarian, the Transformer without retrieval slightly outperforms the LSTM based Neural Fuzzy Repair which includes retrieval
  • We presented a retrieve-edit-rerank framework for seq2seq text generation
Methods
  • 4.1 Datasets and Evaluation Metrics

    The authors test the proposed framework on the machine translation datasets English to Dutch (EN-NL) and English to Hungarian (EN-HU) following previous work (Bulte and Tezcan, 2019).
  • 4.1 Datasets and Evaluation Metrics.
  • The training, validation, and test set sizes, respectively, are 2.4M, 3000 and 3207, and both datasets have the same source English sentences.
  • The authors apply our framework on the Gigaword summarization task (Rush et al, 2015).
  • The training, validation, and test set sizes are 3.8M, 189k, and 1951 respectively.
  • The authors evaluate MT performance using BLEU3 scores.
  • For evaluation on Gigaword, the authors use the F1 scores for ROUGE-1, ROUGE-2, and ROUGE-L with commonly used evaluation parameters4
Results
  • The MT results in Table 1 show that for both ENNL and EN-HU, the Transformer without retrieval slightly outperforms the LSTM based NFR which includes retrieval.
  • BiSET and the best results are obtained by post-ranking, for which the authors use the highest scored output according to the model.
  • The authors' retrieve-edit-rerank system with Transformer, Lucene, and a simple but effective post-ranking function obtains a BLEU score increase of 6.52 on EN-NL and 7.49 on ENHU over the current state of art NFR model.
  • While pre-ranking before editing hurts performance, with post-ranking, the model is able to outperform the Transformer baseline and Re3Sum, obtaining between 0.55-1.24 improvement in ROUGE scores.
  • The authors leave this exploration to future work as it is largely orthogonal to postranking, which is the focus of the efforts
Conclusion
  • Conclusion and Future Work

    In this paper, the authors presented a retrieve-edit-rerank framework for seq2seq text generation.
  • By performing analysis on Gigaword, the authors find that there exists room to improve summarization performance with better post-ranking algorithms, a promising direction for future research.
  • This is in line with the overall goal, which is not to find the best possible way to do the postranking, but only to show that gains are possible by editing multiple candidates and comparing the results.
  • Moving forward, the authors would like to apply this framework to other retrieve-and-edit based generation scenarios such as dialogue, conversation, and code generation
Summary
  • Introduction:

    Retrieve-and-edit text generation methods have received significant recent interest; editing humanauthored text can potentially avoid many of the challenges that are seen while generating text from scratch, including the tendency to be overly repetitive or to degrade on longer texts (Holtzman et al, 2018, 2019).
  • Retrieve-and-edit methods have been developed for summarization (Cao et al, 2018), machine translation (Wu et al, 2019), language modeling (Guu et al, 2018), and conversation generation (Weston et al, 2018)
  • These methods first retrieve a single output from the training set, and use a learned model to edit it into the final output.
  • The authors show that gains are possible and that it helps to see what edits are made for multiple candidates before making the final decision, instead of following previous work by trying to select a single candidate before editing
  • Methods:

    4.1 Datasets and Evaluation Metrics

    The authors test the proposed framework on the machine translation datasets English to Dutch (EN-NL) and English to Hungarian (EN-HU) following previous work (Bulte and Tezcan, 2019).
  • 4.1 Datasets and Evaluation Metrics.
  • The training, validation, and test set sizes, respectively, are 2.4M, 3000 and 3207, and both datasets have the same source English sentences.
  • The authors apply our framework on the Gigaword summarization task (Rush et al, 2015).
  • The training, validation, and test set sizes are 3.8M, 189k, and 1951 respectively.
  • The authors evaluate MT performance using BLEU3 scores.
  • For evaluation on Gigaword, the authors use the F1 scores for ROUGE-1, ROUGE-2, and ROUGE-L with commonly used evaluation parameters4
  • Results:

    The MT results in Table 1 show that for both ENNL and EN-HU, the Transformer without retrieval slightly outperforms the LSTM based NFR which includes retrieval.
  • BiSET and the best results are obtained by post-ranking, for which the authors use the highest scored output according to the model.
  • The authors' retrieve-edit-rerank system with Transformer, Lucene, and a simple but effective post-ranking function obtains a BLEU score increase of 6.52 on EN-NL and 7.49 on ENHU over the current state of art NFR model.
  • While pre-ranking before editing hurts performance, with post-ranking, the model is able to outperform the Transformer baseline and Re3Sum, obtaining between 0.55-1.24 improvement in ROUGE scores.
  • The authors leave this exploration to future work as it is largely orthogonal to postranking, which is the focus of the efforts
  • Conclusion:

    Conclusion and Future Work

    In this paper, the authors presented a retrieve-edit-rerank framework for seq2seq text generation.
  • By performing analysis on Gigaword, the authors find that there exists room to improve summarization performance with better post-ranking algorithms, a promising direction for future research.
  • This is in line with the overall goal, which is not to find the best possible way to do the postranking, but only to show that gains are possible by editing multiple candidates and comparing the results.
  • Moving forward, the authors would like to apply this framework to other retrieve-and-edit based generation scenarios such as dialogue, conversation, and code generation
Tables
  • Table1: BLEU scores on the MT datasets. y1 implies using the best retrieved output from Lucene. LSTM
  • Table2: ROUGE scores for Gigaword summarization. y1 implies using the best retrieved output from Lucene
  • Table3: Sample outputs from the Gigaword test set. “Ret-ID” indicates which of the 30 retrieved y was used in the input, for example, y1 and the pre-ranked y . For the (most-frequent) post-ranked output, we show the y for which the generated output had the highest generation score (log-likelihood) from the model
Download tables as Excel
Related work
  • Recent work has developed retrieve-and-edit approaches for many tasks, including dialogue generation (Weston et al, 2018), language modeling (Guu et al, 2018), code generation (Hashimoto et al, 2018), neural machine translation (NMT) (Gu et al, 2018; Zhang et al, 2018; Cao and Xiong, 2018) and post-editing for NMT (Hokamp, 2017; Dabre et al, 2017). Candidate ranking has served as a core part in some retrieval-based models (Ji et al, 2014; Yan et al, 2016), but these models do not edit the retrieved candidates.

    For machine translation, Bulte and Tezcan (2019) developed a retrieve-and-edit based LSTM model called Neural Fuzzy Repair (NFR), which they applied on two MT datasets obtained from (Steinberger et al, 2012). Using a keyword based followed by a token edit distance based retrieval method called sss+ed, they showed that concatenating the source and retrieved outputs as the input significantly boosts translation quality. NFR is trained by augmenting the source with up to 3 retrieved outputs, which are fed together into the editing model in several ways. Our approach, instead, simply edits multiple candidates separately and then re-ranks the final results.
Reference
  • Bram Bulte and Arda Tezcan. 2019. Neural fuzzy repair: Integrating fuzzy matches into neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1800–1809, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qian Cao and Deyi Xiong. 2018. Encoding gated translation memory into neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3042–3047, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. 2018.
    Google ScholarFindings
  • Retrieve, rerank and rewrite: Soft template based neural summarization. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 152–161, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Raj Dabre, Fabien Cromieres, and Sadao Kurohashi. 2017. Enabling multi-source neural machine translation by concatenating source sentences in multiple languages. arXiv preprint arXiv:1702.06135.
    Findings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6111– 6120, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor OK Li. 201Search engine guided neural machine translation. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang. 2018. Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics, 6:437–450.
    Google ScholarLocate open access versionFindings
  • Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S Liang. 2018. A retrieve-and-edit framework for predicting structured outputs. In Advances in Neural Information Processing Systems, pages 10052–10062.
    Google ScholarLocate open access versionFindings
  • Chris Hokamp. 2017. Ensembling factored neural machine translation models for automatic post-editing and quality estimation. In Proceedings of the Second Conference on Machine Translation, pages 647– 654.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, and Yejin Choi. 2018. Learning to write with cooperative discriminators. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1638–1649, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. ArXiv, abs/1904.09751.
    Findings
  • Zongcheng Ji, Zhengdong Lu, and Hang Li. 20An information retrieval approach to short text conversation. arXiv preprint arXiv:1408.6988.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 379–389, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ralf Steinberger, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schluter. 2012. DGTTM: A freely available translation memory in 22 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 454–459, Istanbul, Turkey. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Kai Wang, Xiaojun Quan, and Rui Wang. 2019. BiSET: Bi-directional selective encoding with template for abstractive summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2153–2162, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Emily Dinan, and Alexander H Miller. 2018. Retrieve and refine: Improved sequence generation models for dialogue. arXiv preprint arXiv:1808.04776.
    Findings
  • Jiawei Wu, Xin Wang, and William Yang Wang. 2019. Extract and edit: An alternative to back-translation for unsupervised neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1173–1183, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrievalbased human-computer conversation system. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 55–64.
    Google ScholarLocate open access versionFindings
  • Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, and Satoshi Nakamura. 2018. Guiding neural machine translation with retrieved translation pieces. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1325– 1335, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments