AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We investigate an iterative refinement algorithm that works with the sequenceto-sequence models to improve generation quality with flexible editing

PAIR: Planning and Iterative Refinement in Pre trained Transformers for Long Text Generation

EMNLP 2020, pp.781-793, (2020)

被引用0|浏览268
下载 PDF 全文
引用
微博一下

摘要

Pre-trained Transformers have enabled impressive breakthroughs in generating long and fluent text, yet their outputs are often “rambling” without coherently arranged content. In this work, we present a novel content-controlled text generation framework, PAIR, with planning and iterative refinement, which is built upon a large model, BART....更多

代码

数据

0
简介
  • Large pre-trained language models are the cornerstone of many state-of-the-art models in various natural language understanding and generation tasks (Devlin et al, 2019; Liu et al, 2019; Lewis et al, 2020), yet they are far from perfect.
  • Models like GPT-2 (Radford et al, 2019) are able to produce plausible text, their spontaneous nature limits their utility in actual applications, e.g., users cannot specify what contents to include, and in what order.
  • Content Plan: (1) a communist3 ▷ begin with8 ▷ coherent ideology15 ▷.
  • [SEN] 21 (2) [SEN] 4 (3) tnoopiecv15id▷en[cSe2E▷N]a1n8y coherent8 ▷ held beliefs12 ▷ any.
  • Template: (1) __0 __1 __2 a communist __5 __6 __7 begin with __10.
  • __11 __12 __13 __14 coherent ideology__17 __18 __19 __20 (2) __0 __1 __2 __3 (3) __0 __1 no evidence __4 __5 __6 __7 any coherent __10
重点内容
  • Large pre-trained language models are the cornerstone of many state-of-the-art models in various natural language understanding and generation tasks (Devlin et al, 2019; Liu et al, 2019; Lewis et al, 2020), yet they are far from perfect
  • Outputs generated in a single pass may suffer from incorrectness and incoherence, we propose an iterative refinement procedure to improve the quality
  • We evaluate our generation and planning models on datasets from three distinct domains for multiparagraph-level text generation: (1) argument generation (ARGGEN) (Hua et al, 2019), to produce a counter-argument to refute a given proposition; (2) writing opinionated articles (OPINION), e.g., editorials and op-eds, to show idea exchange on a given subject; and (3) composing news reports (NEWS) to describe events
  • We present the same 50 prompts from the previous evaluation on argument generation, and an additional 50 samples for opinion article writing to the same group of human judge
  • We investigate an iterative refinement algorithm that works with the sequenceto-sequence models to improve generation quality with flexible editing
  • Both automatic evaluation and human judgments show that our model with planning and refinement enhances the relevance and coherence of the generated content
结果
  • 5.1 Automatic Evaluation

    The authors report scores with BLEU (Papineni et al, 2002), which is based on n-gram precision; ROUGE-L (Lin, 2004), measuring recall of the longest common subsequences; and METEOR (Lavie and Agarwal, 2007), which accounts for paraphrase.
  • PAIRfull that has access to full content plans obtains significantly better scores than PAIRlight that only includes keyphrase assignments but not their positions.
  • The authors find that the planner often falls short of accurately positioning the given keyphrases, leading to degraded generation performance.
  • This points to a potential direction for future work where better positioning model should be developed
结论
  • The authors present a novel content-controlled generation framework that adds content planning to large pretrained Transformers without modifying model architecture.
  • A BERT-based planning model is first designed to assign and position keyphrases into different sentences.
  • The authors investigate an iterative refinement algorithm that works with the sequenceto-sequence models to improve generation quality with flexible editing
  • Both automatic evaluation and human judgments show that the model with planning and refinement enhances the relevance and coherence of the generated content
总结
  • Introduction:

    Large pre-trained language models are the cornerstone of many state-of-the-art models in various natural language understanding and generation tasks (Devlin et al, 2019; Liu et al, 2019; Lewis et al, 2020), yet they are far from perfect.
  • Models like GPT-2 (Radford et al, 2019) are able to produce plausible text, their spontaneous nature limits their utility in actual applications, e.g., users cannot specify what contents to include, and in what order.
  • Content Plan: (1) a communist3 ▷ begin with8 ▷ coherent ideology15 ▷.
  • [SEN] 21 (2) [SEN] 4 (3) tnoopiecv15id▷en[cSe2E▷N]a1n8y coherent8 ▷ held beliefs12 ▷ any.
  • Template: (1) __0 __1 __2 a communist __5 __6 __7 begin with __10.
  • __11 __12 __13 __14 coherent ideology__17 __18 __19 __20 (2) __0 __1 __2 __3 (3) __0 __1 no evidence __4 __5 __6 __7 any coherent __10
  • Objectives:

    This work aims to bring new insights into how to effectively incorporate content plans into large models to generate more relevant and coherent text.
  • The authors aim to investigate whether contentcontrolled generation with ground-truth content plans resembles human-written text by studying discourse phenomena.
  • This study aims to evaluate three text generation systems for counter-argument generation resembling the reddit ChangeMyView style (CMV).
  • This study aims to compare some intervention strategies over the same model
  • Results:

    5.1 Automatic Evaluation

    The authors report scores with BLEU (Papineni et al, 2002), which is based on n-gram precision; ROUGE-L (Lin, 2004), measuring recall of the longest common subsequences; and METEOR (Lavie and Agarwal, 2007), which accounts for paraphrase.
  • PAIRfull that has access to full content plans obtains significantly better scores than PAIRlight that only includes keyphrase assignments but not their positions.
  • The authors find that the planner often falls short of accurately positioning the given keyphrases, leading to degraded generation performance.
  • This points to a potential direction for future work where better positioning model should be developed
  • Conclusion:

    The authors present a novel content-controlled generation framework that adds content planning to large pretrained Transformers without modifying model architecture.
  • A BERT-based planning model is first designed to assign and position keyphrases into different sentences.
  • The authors investigate an iterative refinement algorithm that works with the sequenceto-sequence models to improve generation quality with flexible editing
  • Both automatic evaluation and human judgments show that the model with planning and refinement enhances the relevance and coherence of the generated content
表格
  • Table1: Statistics of the three datasets. We report average lengths of the prompt and the target generation, number of unique keyphrases (# KP) used in the input, and the percentage of content words in target covered by the keyphrases (KP Cov.)
  • Table2: Key results on argument generation, opinion article writing, and news report generation. BLEU-4 (B4), ROUGE-L (R-L), METEOR (MTR), and average output lengths are reported (for references, the lengths are 100, 166, and 250, respectively). PAIRlight, using keyphrase assignments only, consistently outperforms baselines; adding keyphrase positions, PAIRfull further boosts scores. Improvements by our models over baselines are all significant (p < 0.0001, approximate randomization test). Iterative refinement helps on both setups
  • Table3: Human evaluation for argument generation on fluency, coherence, and relevance, with 5 as the best. The Krippendorff’s α are 0.28, 0.30, and 0.37, respectively. Our model outputs are significantly more coherent and relevant than KPSEQ2SEQ (∗: p < 0.0001), with comparable fluency
  • Table4: Sample outputs in the news and opinion domain. Keyphrases assigned to different sentences are in boldface and color-coded
  • Table5: Percentages of samples preferred by human judges before and after refinement [Left]; with and without enforcing keyphrases to appear at the predicted positions [Right]. Ties are omitted
  • Table6: Statistics on generated templates by our content planner. Tokens are measured in units of WordPiece (<a class="ref-link" id="cSennrich_et+al_2016_a" href="#rSennrich_et+al_2016_a">Sennrich et al, 2016</a>). KP distance denotes the average number of tokens between two keyphrases that are in the same sentence. Both system output (sys) and human reference (ref ) are reported
Download tables as Excel
相关工作
  • Content Planning as a Generation Component. Despite the impressive progress made in many generation tasks, neural systems are known to produce low-quality content (Wiseman et al, 2017; Rohrbach et al, 2018), often with low relevance (Li et al, 2016) and poor discourse structure (Zhao et al, 2017; Xu et al, 2020). Consequently, planning modules are designed and added into neural systems to enhance content relevance (Wiseman et al, 2018; Moryossef et al, 2019; Yao et al, 2019; Hua and Wang, 2019). However, it is still an open question to include content plans in large models, given the additional and expensive model retraining required. This work innovates by adding content plans as masked templates and designing refinement strategy to further boost generation performance, without architectural change.

    Controlled Text Generation. Our work is also in line with the study of controllability of neural text generation models. This includes manipulating the syntax (Dusek and Jurc ́ıcek, 2016; Goyal and Durrett, 2020) and semantics (Wen et al, 2015; Chen et al, 2019) of the output. Specific applications encourage the model to cover a given topic (Wang et al, 2017; See et al, 2019), mention specified entities (Fan et al, 2018), or display a certain attribute (Hu et al, 2017; Luo et al, 2019; Balakrishnan et al, 2019). However, most existing work relies on model engineering, limiting the generalizability to new domains and adaptability to large pre-trained Transformers. One exception is the Plug and Play model (Dathathri et al, 2020), which directly modifies the key and value states of GPT2 (Radford et al, 2019). However, since the signal is derived from the whole generated text, it is too coarse to provide precise sentence-level content control. Here, we instead gain fine-grained controllability through keyphrase assignment and positioning per sentence, which can be adapted to any off-the-shelf pre-trained Transformer generators.
基金
  • This research is supported in part by National Science Foundation through Grant IIS-1813341 and Nvidia GPU gifts
研究对象与分析
samples: 50
5.2 Human Evaluation. We hire four proficient English speakers3 to rate three aspects of the generated arguments on a scale of 1 (worst) to 5 (best): fluency, coherence—if the information organization is natural and logical, and relevance—if the topic is related to the prompt and whether the stance is correct. 50 samples are randomly selected, with system outputs by KPSEQ2SEQ, PAIRfull and PAIRlight shown to human judges in random order. The evaluation

New Jersey teenagers: 4
Prompt (News): 4 Arrested in Theft of Baby Jesus Figurines. PAIRfull: Four New Jersey teenagers arrested yesterday were accused of stealing more than 25 plastic baby Jesus figurines from a church before they burn in a bonfire, the police said. The police in Sayreville, N.J., arrested Michael Payne, 18, and T.J

samples: 50
We further ask whether human judges prefer the refined text and whether enforcing keyphrases to be generated yields noticeable content improvement. In a second study, we present the same 50 prompts from the previous evaluation on argument generation, and an additional 50 samples for opinion article writing to the same group of human judge. For each sample, PAIRfull’s outputs with and without refinement are shown in random order

datasets: 3
. Statistics of the three datasets. We report average lengths of the prompt and the target generation, number of unique keyphrases (# KP) used in the input, and the percentage of content words in target covered by the keyphrases (KP Cov.). Key results on argument generation, opinion article writing, and news report generation. BLEU-4 (B4), ROUGE-L (R-L), METEOR (MTR), and average output lengths are reported (for references, the lengths are 100, 166, and 250, respectively). PAIRlight, using keyphrase assignments only, consistently outperforms baselines; adding keyphrase positions, PAIRfull further boosts scores. Improvements by our models over baselines are all significant (p < 0.0001, approximate randomization test). Iterative refinement helps on both setups

引用论文
  • Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, and Rajen Subba. 2019. Constrained decoding for neural NLG from compositional representations in task-oriented dialogue. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 831– 844, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Charles B. Callaway. 2003. Integrating discourse markers into a pipelined natural language generation architecture. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 264–271, Sapporo, Japan. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lynn Carlson, Daniel Marcu, and Mary Ellen Okurovsky. 2001. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue.
    Google ScholarLocate open access versionFindings
  • Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. 2019. A multi-task approach for disentangling syntax and semantics in sentence representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2453–2464, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems, pages 13042–13054.
    Google ScholarLocate open access versionFindings
  • Pablo A. Duboue and Kathleen R. McKeown. 2001. Empirically estimating order constraints for content planning in generation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 172–179, Toulouse, France. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ondrej Dusek and Filip Jurcıcek. 2016. Sequence-tosequence generation for spoken dialogue via deep syntax trees and strings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 45–51, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • WA Falcon. 2019. Pytorch lightning. GitHub. Note: https://github.com/williamFalcon/pytorch-lightning Cited by, 3.
    Locate open access versionFindings
  • Markus Freitag, Isaac Caswell, and Scott Roy. 2019. APE at scale and its implications on MT evaluation biases. In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pages 34–44, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6112– 6121, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tanya Goyal and Greg Durrett. 2020. Neural syntactic preordering for controlled paraphrase generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 238– 252, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Brigitte Grote and Manfred Stede. 1998. Discourse marker choice in sentence planning. In Natural Language Generation.
    Google ScholarLocate open access versionFindings
  • Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. 2019. Improved lexically constrained decoding for translation and monolingual rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 839–850, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1587–1596. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Angela Fan, David Grangier, and Michael Auli. 2018. Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 45–54, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xinyu Hua, Zhe Hu, and Lu Wang. 2019. Argument generation with retrieval, planning, and realization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2661–2672, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xinyu Hua and Lu Wang. 2019. Sentence-level content planning and style specification for neural text generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 591–602, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yangfeng Ji and Jacob Eisenstein. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13–24, Baltimore, Maryland. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jungo Kasai, James Cross, Marjan Ghazvininejad, and Jiatao Gu. 2020. Non-autoregressive machine translation with disentangled context transformer. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. 2019. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
    Findings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.
    Google ScholarFindings
  • Alon Lavie and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 228–231, Prague, Czech Republic. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Carolin Lawrence, Bhushan Kotnis, and Mathias Niepert. 2019. Attending to future tokens for bidirectional sequence generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1–10, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1173– 1182, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pretraining for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
    Google ScholarFindings
  • Chin-Yew Lin and Eduard Hovy. 2000. The automated acquisition of topic signatures for text summarization. In COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Fuli Luo, Damai Dai, Pengcheng Yang, Tianyu Liu, Baobao Chang, Zhifang Sui, and Xu Sun. 2019. Learning to control the fine-grained sentiment for story ending generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6020–6026, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Elman Mansimov, Alex Wang, Sean Welleck, and Kyunghyun Cho. 2019. A generalized framework of sequence generation with application to undirected sequence models. arXiv preprint arXiv:1905.12790.
    Findings
  • George A. Miller. 1994. WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994.
    Google ScholarLocate open access versionFindings
  • Amit Moryossef, Yoav Goldberg, and Ido Dagan. 2019. Step-by-step: Separating planning from realization in neural data-to-text generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2267–2277, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Roman Novak, Michael Auli, and David Grangier. 2016. Iterative refinement for machine translation. arXiv preprint arXiv:1610.06602.
    Findings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035.
    Google ScholarLocate open access versionFindings
  • Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314–1324, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind K Joshi, and Bonnie L Webber. 2008. The penn discourse treebank 2.0. In LREC. Citeseer.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.
    Google ScholarLocate open access versionFindings
  • Lena Reed, Shereen Oraby, and Marilyn Walker. 2018. Can neural generators for dialogue learn sentence planning and discourse structuring? In Proceedings of the 11th International Conference on Natural Language Generation, pages 284–295, Tilburg University, The Netherlands. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, and Kate Saenko. 2018. Object hallucination in image captioning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4035–4045, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Evan Sandhaus. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
    Google ScholarLocate open access versionFindings
  • Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1702–1723, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Amanda Stent, Rashmi Prasad, and Marilyn Walker. 2004. Trainable sentence planning for complex information presentations in spoken dialog systems. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL04), pages 79–86, Barcelona, Spain.
    Google ScholarLocate open access versionFindings
  • Di Wang, Nebojsa Jojic, Chris Brockett, and Eric Nyberg. 2017. Steering output style and topic in neural response generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2140–2150, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, PeiHao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1711–1721, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Emily Dinan, and Alexander Miller. 2018. Retrieve and refine: Improved sequence generation models for dialogue. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 87–92, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sam Wiseman, Stuart Shieber, and Alexander Rush. 2017. Challenges in data-to-document generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2253–2263, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sam Wiseman, Stuart Shieber, and Alexander Rush. 2018. Learning neural templates for text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3174–3187, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems, pages 1784–1794.
    Google ScholarLocate open access versionFindings
  • Jiacheng Xu, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Discourse-aware neural extractive text summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5021–5031, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, and Rui Yan. 2019. Planand-write: Towards better automatic storytelling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7378–7385.
    Google ScholarLocate open access versionFindings
  • Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 654–664, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Computing Infrastructure. Our model is built upon the PyTorch transformers-2.6.0 library by Wolf et al. (2019), with
    Google ScholarLocate open access versionFindings
  • Pytorch-Lightning-0.7.3 (Falcon, 2019) for training routines. To improve training efficiency, we adopt mixed-precision floating point (FP16) computation using the O2 option of
    Google ScholarFindings
  • Model Sizes. Our generation model has the same architecture as BART (Lewis et al., 2020) with 406M parameters. The content planner is built on top of BERTbase, which has 110M parameters.
    Google ScholarFindings
您的评分 :
0

 

标签
评论
小科