Facts2Story: Controlling Text Generation by Key Facts

Eyal Orbach
Eyal Orbach

COLING, pp.2329-2345, (2020)

Cited by: 0|Views29
EI
Weibo:
We propose a controlled generation task which is based on expanding a sequence of facts, expressed in natural language, into a longer narrative

Abstract:

Recent advancements in self-attention neural network architectures have raised the bar for open-ended text generation. Yet, while current methods are capable of producing a coherent text which is several hundred words long, attaining control over the content that is being generated -- as well as evaluating it -- are still open questions...More

Code:

Data:

0
ZH
Full Text
Bibtex
Weibo
Introduction
  • Story generation is a challenging task in natural language processing, which requires automated systems to produce creative, open-ended text that remains coherent, cohesive and preferably engaging: an ability humans are clearly capable of.
  • Advancements in self-attention architectures and large-scale training resulted in series of pre-trained language models (Radford et al, 2019; Zellers et al, 2019) that demonstrate an ability to remain on topic, while generating cohesive passages which are several hundreds of words long.
  • Can the authors harness the power of large pre-trained models to generate coherent text, while allowing finer-grained controlled over the generated text?
Highlights
  • Story generation is a challenging task in natural language processing, which requires automated systems to produce creative, open-ended text that remains coherent, cohesive and preferably engaging: an ability humans are clearly capable of
  • Advancements in self-attention architectures and large-scale training resulted in series of pre-trained language models (Radford et al, 2019; Zellers et al, 2019) that demonstrate an ability to remain on topic, while generating cohesive passages which are several hundreds of words long
  • We introduce a challenging task for controlled text generation that has a clear criteria regarding adhering to given input while maintaining high degree of freedom for the generated text
  • XLNet pre-training involved predicting only 15% of the tokens while being exposed to the rest, which is twice less than BART’s pre-training objective, yet while having a smaller parameter count, our suggested cloze-XLNet method surpasses BART on all our reported metrics
  • We show a simple planning technique of copying and structuring, that enables reframing the objective as formulating a cloze task followed by filling in the blanks with a pre-trained, permutated order, autoregressive language model, achieving competitive results in regard to the coherence of the generated text and substantially superior performance in adhering to the key facts controlling mechanism
Methods
  • 8.1 Models Training and Inference

    The authors hold out one ninth of the training material to serve as a validation dataset to avoid overfitting during fine-tuning.
  • The authors use the code published by Ziegler et al (2019) utilizing its support for additional embeddings and train with the configuration for story generation, with the 117M parameter GPT2 model and additional parameters for the encoder resulting in 181M parameters.
  • The authors fine-tune BART-Large with the implementation available in the Transformers framework (Wolf et al, 2019) with its 406M parameters, fine-tuning until reaching minimal loss on the validation set.
  • Generation for the test evaluations is done with top-k sampling of 40 and temperature of 0.85
Results
  • Exposes that the pseudo-self-attention mechanism was successful in leading the decoder towards production of entity names that appeared in the input facts, along with various themes, like “aliens” or “murder” but failed to either copy or represent in other ways, relations described in the facts, producing text that is more loosely related to the input than intended
  • The authors claim this can be attributed to the left-to-right unidirectional pre-training, requiring substantially more “retraining” for the model to learn to attend to representations that signify text that is in the “future” or to the right of the text being generated
Conclusion
  • The authors suggest a challenging task of controlling story generation content by key facts.
  • The authors demonstrate an approach for deriving a large corpus for this task, suitable for supervised fine-tuning, and evaluate the ability of notable massively pre-trained models, fine-tuned on this corpus, to generate stories adhering to the suggested representations of content.
  • The authors show a simple planning technique of copying and structuring, that enables reframing the objective as formulating a cloze task followed by filling in the blanks with a pre-trained, permutated order, autoregressive language model, achieving competitive results in regard to the coherence of the generated text and substantially superior performance in adhering to the key facts controlling mechanism
Summary
  • Introduction:

    Story generation is a challenging task in natural language processing, which requires automated systems to produce creative, open-ended text that remains coherent, cohesive and preferably engaging: an ability humans are clearly capable of.
  • Advancements in self-attention architectures and large-scale training resulted in series of pre-trained language models (Radford et al, 2019; Zellers et al, 2019) that demonstrate an ability to remain on topic, while generating cohesive passages which are several hundreds of words long.
  • Can the authors harness the power of large pre-trained models to generate coherent text, while allowing finer-grained controlled over the generated text?
  • Methods:

    8.1 Models Training and Inference

    The authors hold out one ninth of the training material to serve as a validation dataset to avoid overfitting during fine-tuning.
  • The authors use the code published by Ziegler et al (2019) utilizing its support for additional embeddings and train with the configuration for story generation, with the 117M parameter GPT2 model and additional parameters for the encoder resulting in 181M parameters.
  • The authors fine-tune BART-Large with the implementation available in the Transformers framework (Wolf et al, 2019) with its 406M parameters, fine-tuning until reaching minimal loss on the validation set.
  • Generation for the test evaluations is done with top-k sampling of 40 and temperature of 0.85
  • Results:

    Exposes that the pseudo-self-attention mechanism was successful in leading the decoder towards production of entity names that appeared in the input facts, along with various themes, like “aliens” or “murder” but failed to either copy or represent in other ways, relations described in the facts, producing text that is more loosely related to the input than intended
  • The authors claim this can be attributed to the left-to-right unidirectional pre-training, requiring substantially more “retraining” for the model to learn to attend to representations that signify text that is in the “future” or to the right of the text being generated
  • Conclusion:

    The authors suggest a challenging task of controlling story generation content by key facts.
  • The authors demonstrate an approach for deriving a large corpus for this task, suitable for supervised fine-tuning, and evaluate the ability of notable massively pre-trained models, fine-tuned on this corpus, to generate stories adhering to the suggested representations of content.
  • The authors show a simple planning technique of copying and structuring, that enables reframing the objective as formulating a cloze task followed by filling in the blanks with a pre-trained, permutated order, autoregressive language model, achieving competitive results in regard to the coherence of the generated text and substantially superior performance in adhering to the key facts controlling mechanism
Tables
  • Table1: Averaged rating per model
  • Table2: Generated Stories for Facts Set 1
  • Table3: Generated Stories for Facts Set 2
  • Table4: Generated Stories for Facts Set 3
Download tables as Excel
Related work
  • 2.1 Plotline Representations in Neural Story Generation Earlier works in neural network story generation experiment with recurrent or convolutional sequenceto-sequence architectures, encountering difficulties in generating long text that stays on track. While

    2https://github.com/eyal-orbach/Facts2Story-data 3https://github.com/eyal-orbach/Facts2Story-XLNetPlanCloze addressing this challenge these works yield interesting mechanisms to represent the plotline of a story. Martin et al (2018) represent a plot as a chain of events, learning to generate a sentence from each event, while Yao et al (2018) alternatively build the plotline as a chain of keywords. Kiddon et al (2016) avoid the need to represent each of the desired sentences by maintaining a checklist of required words and implementing a gating mechanism to insert these words and track which were used, demonstrating these abilities on cooking recipes. Zhai et al (2019) develop this notion further by using events as the ingredients in the checklist, but also conditioning the generated text on the desired next event, concluding that their model still generates shorter stories with less event coverage then those produced by humans.

    Fan et al (2018) collect a corpus of writing prompts and their appropriate stories as generated by Reddit users. Implementing a convolutional sequence-to-sequence architecture they train on generating the stories from their prompts, noting a tendency for such architectures to ignore the input and focus on the local dependencies required for language modeling rather than the more complex dependencies between the prompt and the text. To evaluate the correlation of the story to its prompt the writers suggest measuring perplexity when using the corresponding prompts versus randomly chosen ones. We claim this is not a strong enough requirement and does not suggest correlation in a semantically meaningful manner. Drissi et al (2018) train on extractive summaries and their respective text, but do not measure the degree to which the outputs correlate to their respective summaries as they regard this technique only as a step towards improving the coherence of the generated output, concluding that human evaluations do not suggest it as helpful.
Funding
  • The derived corpus has a ratio of 1/6 between the number of words in the key facts and the full plot, requiring the model to produce substantially more text then the context it is exposed to: the unsupervised denoising objectives used in pre-training the models addressed in this paper, as well as other notable models, typically train while masking only between 15% to 30% of the tokens (Yang et al, 2019; Lewis et al, 2019; Devlin et al, 2018) providing substantially larger visible context
  • XLNet pre-training involved predicting only 15% of the tokens while being exposed to the rest, which is twice less than BART’s pre-training objective, yet while also having a smaller parameter count, our suggested cloze-XLNet method surpasses BART on all our reported metrics
Reference
  • David Bamman, Brendan O’Connor, and Noah A Smith. 2013. Learning latent personas of film characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 352–361.
    Google ScholarLocate open access versionFindings
  • Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, pages 2670–2676, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
    Google ScholarLocate open access versionFindings
  • Jonathan D. Culler. 2001. The pursuit of signs: semiotics, literature, deconstruction. Routledge.
    Google ScholarFindings
  • Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2019. Plug and play language models: A simple approach to controlled text generation.
    Google ScholarFindings
  • Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 355–366. ACM.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Mehdi Drissi, Olivia Watkins, and Jugal Kalita. 2018. Hierarchical text generation using an outline.
    Google ScholarFindings
  • Angela Fan, Mike Lewis, and Yann Dauphin. 201Hierarchical neural story generation.
    Google ScholarFindings
  • Kiril Gashteovski, Rainer Gemulla, and Luciano del Corro. 2017. MinIE: Minimizing facts in open information extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2630–2640, Copenhagen, Denmark, sep. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Brent Harrison, Christopher Purdy, and Mark O Riedl. 2017. Toward automated story generation with markov chain monte carlo methods and deep neural networks. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration.
    Google ScholarFindings
  • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, and Omer Levy. 2020. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
    Google ScholarLocate open access versionFindings
  • Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. Ctrl: A conditional transformer language model for controllable generation.
    Google ScholarFindings
  • Chloe Kiddon, Luke Zettlemoyer, and Yejin Choi. 2016. Globally coherent text generation with neural checklist models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 329–339.
    Google ScholarLocate open access versionFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
    Google ScholarFindings
  • Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
    Google ScholarFindings
  • Lara J Martin, Prithviraj Ammanabrolu, Xinyu Wang, William Hancock, Shruti Singh, Brent Harrison, and Mark O Riedl. 2018. Event representations for automated story generation with deep neural nets. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Ken McRae, George S Cree, Mark S Seidenberg, and Chris McNorgan. 2005. Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4):547–559.
    Google ScholarLocate open access versionFindings
  • Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
    Google ScholarFindings
  • Marco Ponza, Luciano Del Corro, and Gerhard Weikum. 2018. ”facts that matter”. In ”Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing”, pages ”1043–1048”, ”Brussels, Belgium”, oct ”-” nov. ”Association for Computational Linguistics”.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
    Google ScholarLocate open access versionFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer.
    Google ScholarFindings
  • Justin Robischon. 2015. Wikipedia movie plots. https://www.kaggle.com/jrobischon/wikipedia-movie-plots, October.
    Findings
  • Wilson L. Taylor. 1953. ”cloze procedure”: a new tool for measuring readability. Journalism & Mass Communication Quarterly, 30:415–433.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-theart natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding.
    Google ScholarFindings
  • Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, and Rui Yan. 2018. Plan-and-write: Towards better automatic storytelling.
    Google ScholarFindings
  • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news.
    Google ScholarFindings
  • Fangzhou Zhai, Vera Demberg, Pavel Shkadzko, Wei Shi, and Asad Sayeed. 2019. A hybrid model for globally coherent story generation. In Proceedings of the Second Workshop on Storytelling, pages 34–45.
    Google ScholarLocate open access versionFindings
  • Sheng Zhang, Rachel Rudinger, Kevin Duh, and Benjamin Van Durme. 2017. Ordinal common-sense inference. Transactions of the Association for Computational Linguistics, 5:379–395.
    Google ScholarLocate open access versionFindings
  • Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, and Alexander M. Rush. 2019. Encoder-agnostic adaptation for conditional language generation.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments