Paragraph-Level Commonsense Transformers with Recurrent Memory

Cited by: 0|Bibtex|Views34
Other Links: arxiv.org
Weibo:
We introduced a new task of discourse-aware commonsense inference over narratives

Abstract:

Human understanding of narrative texts requires making commonsense inferences beyond what is stated in the text explicitly. A recent model, COMeT, can generate such inferences along several dimensions such as pre- and post-conditions, motivations, and mental-states of the participants. However, COMeT was trained on short phrases, and is...More

Code:

Data:

0
Introduction
Highlights
  • Narrative understanding is a long-standing challenge in the field of natural language processing (NLP) (Charniak 1972; Winograd 1972)
  • A key component of our distant supervision approach is the availability of sentence-level commonsense inferences
  • We introduced a new task of discourse-aware commonsense inference over narratives
  • We are interested in exploring further extensions of our work to downstream paragraph- and narrative-level tasks that may benefit from access to commonsense knowledge
Methods
  • All models are implemented using the Transformers package (Wolf et al 2019), and trained for a maximum of 20 epochs.
  • Training is performed using an Adam optimizer with linear warmup (Kingma and Ba 2015).
  • The authors simulate a batch size of 16 using gradient accumulation and an actual batch size of 4.
  • The learning rate is 2−5.
  • The authors use the 124M parameter version of the GPT2 model, which was pretrained on WebText.
Results
  • The authors report the performance of all models for automatic evaluation and the top 6 model variations for human evaluation.
  • The authors follow a similar crowdsourcing setup to the validation presented in Section 4.4 to measure the quality of generated inferences.
  • The authors show crowdworkers the full story, a specified dimension, and a generated inference.
  • Following Zhang et al (2017), the authors ask workers to judge the likelihood of inferences based on a 5-point Likert scale: obviously true (5), generally true (4), plausible (3), neutral or unclear (2), and doesn’t make sense (1).
Conclusion
  • The authors introduced a new task of discourse-aware commonsense inference over narratives. To target this task, the authors proposed a new model, PARA-COMET, trained using distant supervision, that captures narrative discourse.

    Despite the challenges of the task, the authors demonstrated the effectiveness of the approach using both automatic and human evaluations.
  • The authors introduced a new task of discourse-aware commonsense inference over narratives.
  • To target this task, the authors proposed a new model, PARA-COMET, trained using distant supervision, that captures narrative discourse.
  • The authors' models were able to generate more implicit and novel discourse-aware inferences.
  • The authors are interested in exploring further extensions of the work to downstream paragraph- and narrative-level tasks that may benefit from access to commonsense knowledge
Summary
  • Introduction:

    Narrative understanding is a long-standing challenge in the field of natural language processing (NLP) (Charniak 1972; Winograd 1972).
  • The most crucial aspect of narrative understanding is the ability to make implicit commonsense inferences about entities and events in a story and refining them as the story unfolds (Pettijohn and Radvansky 2016; Williams, Lieberman, and Winston 2017; Rashkin et al 2018; Qin et al 2019).
  • The event e1 and the inference e2 are natural language templates consisting of variables PersonX for the agent and
  • Objectives:

    Understanding these stories requires commonsense and temporal inferences that the authors aim to capture.
  • The authors aim to generate the types of commonsense inferences defined by the ATOMIC knowledge base (Sap et al 2019)
  • Methods:

    All models are implemented using the Transformers package (Wolf et al 2019), and trained for a maximum of 20 epochs.
  • Training is performed using an Adam optimizer with linear warmup (Kingma and Ba 2015).
  • The authors simulate a batch size of 16 using gradient accumulation and an actual batch size of 4.
  • The learning rate is 2−5.
  • The authors use the 124M parameter version of the GPT2 model, which was pretrained on WebText.
  • Results:

    The authors report the performance of all models for automatic evaluation and the top 6 model variations for human evaluation.
  • The authors follow a similar crowdsourcing setup to the validation presented in Section 4.4 to measure the quality of generated inferences.
  • The authors show crowdworkers the full story, a specified dimension, and a generated inference.
  • Following Zhang et al (2017), the authors ask workers to judge the likelihood of inferences based on a 5-point Likert scale: obviously true (5), generally true (4), plausible (3), neutral or unclear (2), and doesn’t make sense (1).
  • Conclusion:

    The authors introduced a new task of discourse-aware commonsense inference over narratives. To target this task, the authors proposed a new model, PARA-COMET, trained using distant supervision, that captures narrative discourse.

    Despite the challenges of the task, the authors demonstrated the effectiveness of the approach using both automatic and human evaluations.
  • The authors introduced a new task of discourse-aware commonsense inference over narratives.
  • To target this task, the authors proposed a new model, PARA-COMET, trained using distant supervision, that captures narrative discourse.
  • The authors' models were able to generate more implicit and novel discourse-aware inferences.
  • The authors are interested in exploring further extensions of the work to downstream paragraph- and narrative-level tasks that may benefit from access to commonsense knowledge
Tables
  • Table1: Examples generated from the models in this paper: a discourse-agnostic (sentence-level) baseline, vs. our discourseaware PARA-COMET. We highlight the sentence that each inference was generated for in bold. Inferences are marked as plausible () or implausible ()
  • Table2: Natural language templates for ATOMIC dimensions
  • Table3: Examples from the distantly supervised dataset. We highlight the most relevant (i.e. potentially contradictory or supporting) sections in the story for each inference being considered. LM score shows the average token log probability. Inferences are marked as relevant () or irrelevant ()
  • Table4: Human evaluation results. We highlight the overall best performing model in bold
  • Table5: Performance according to the automatic evaluation metrics. The NLI score is the percent of stories for which the model predicted entail or neutral
  • Table6: Example from COSMOSQA with COMeT (beam10) and PARA-COMeT predictions
  • Table7: Special tokens used
  • Table8: Full human evaluation results
  • Table9: Examples generated from the models in this paper: a discourse-agnostic sentence-level baseline, vs. our discourseaware PARA-COMET. We highlight the sentence that each inference was generated for in bold. Inferences are marked as plausible () or implausible ()
Download tables as Excel
Funding
  • This research was supported in part by DARPA under the CwC program through the ARO (W911NF-15-10543) and DARPA under the MCS program through NIWC Pacific (N66001-19-2-4031)
Reference
  • Ammanabrolu, P.; Cheung, W.; Broniec, W.; and Riedl, M. O. 2020. Automated Storytelling via Causal, Commonsense Plot Ordering. arXiv preprint arXiv:2009.00829.
    Findings
  • Bhagavatula, C.; Le Bras, R.; Malaviya, C.; Sakaguchi, K.; Holtzman, A.; Rashkin, H.; Downey, D.; Yih, W.-t.; and Choi, Y. 2019. Abductive Commonsense Reasoning. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; and Choi, Y. 2019. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In ACL.
    Google ScholarFindings
  • Chakrabarty, T.; Ghosh, D.; Muresan, S.; and Peng, N. 2020. R3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7976–7986. Online: Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.711. URL https://www.aclweb.org/anthology/2020.acl-main.711.
    Locate open access versionFindings
  • Chambers, N.; and Jurafsky, D. 2008. Unsupervised learning of narrative event chains. In Proceedings of ACL-08: HLT, 789–797.
    Google ScholarLocate open access versionFindings
  • Charniak, E. 1972. Toward a model of children’s story comprehension. Ph.D. thesis, Massachusetts Institute of Technology.
    Google ScholarFindings
  • Dagan, I.; Roth, D.; Zanzotto, F.; and Sammons, M. 2013. Recognizing textual entailment: Models and applications. Morgan & Claypool Publishers.
    Google ScholarFindings
  • Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
    Google ScholarFindings
  • Feldman, J.; Davison, J.; and Rush, A. M. 201Commonsense Knowledge Mining from Pretrained Models. In EMNLP/IJCNLP.
    Google ScholarFindings
  • Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76(5): 378.
    Google ScholarLocate open access versionFindings
  • Granroth-Wilding, M.; and Clark, S. 2016. What happens next? event prediction using a compositional neural network model. In Thirtieth AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Guan, J.; Huang, F.; Zhao, Z.; Zhu, X.; and Huang, M. 2020. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation. Transactions of the Association for Computational Linguistics 8: 93–108.
    Google ScholarLocate open access versionFindings
  • Hale, J. 2001. A Probabilistic Earley Parser as a Psycholinguistic Model. In NAACL.
    Google ScholarFindings
  • Heider, F. 1958. The Psychology of Interpersonal Relations. John Wiley & Sons Inc.
    Google ScholarFindings
  • Huang, L.; Le Bras, R.; Bhagavatula, C.; and Choi, Y. 2019. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In EMNLP/IJCNLP.
    Google ScholarFindings
  • Jans, B.; Bethard, S.; Vulic, I.; and Moens, M. F. 2012. Skip n-grams and ranking functions for predicting script events. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 336–344. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jastrzebski, S.; Bahdanau, D.; Hosseini, S.; Noukhovitch, M.; Bengio, Y.; and Cheung, J. C. K. 2018. Commonsense mining as knowledge base completion? A study on the impact of novelty. ArXiv abs/1804.09259.
    Findings
  • Kearns, W. R.; Kaura, N.; Divina, M.; Vo, C. V.; Si, D.; Ward, T. M.; and Yuwen, W. 2020. A Wizard-of-Oz Interface and Persona-based Methodology for Collecting Health Counseling Dialog. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems.
    Google ScholarFindings
  • Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980.
    Findings
  • Kozareva, Z.; and Hovy, E. 2011. Learning Temporal Information for States and Events. In 2011 IEEE Fifth International Conference on Semantic Computing, 424–429.
    Google ScholarLocate open access versionFindings
  • Li, Z.; Ding, X.; and Liu, T. 2018. Constructing Narrative Event Evolutionary Graph for Script Event Prediction. In IJCAI.
    Google ScholarFindings
  • Lin, C.-Y. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, 74–81.
    Google ScholarFindings
  • Barcelona, Spain: Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W041013.
    Findings
  • Mostafazadeh, N.; Chambers, N.; He, X.; Parikh, D.; Batra, D.; Vanderwende, L.; Kohli, P.; and Allen, J. 2016. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 839–849. San Diego, California: Association for Computational Linguistics. doi:10.18653/v1/N16-1098. URL https://www.aclweb.org/anthology/N16-1098.
    Locate open access versionFindings
  • Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2001. Bleu: a Method for Automatic Evaluation of Machine Translation. In ACL.
    Google ScholarLocate open access versionFindings
  • Pettijohn, K.; and Radvansky, G. 2016. Narrative event boundaries, reading times, and expectation. In Mem Cogn 44, 1064–1075.
    Google ScholarFindings
  • Pichotta, K.; and Mooney, R. 2014. Statistical Script Learning with Multi-Argument Events. In EACL, 220–229.
    Google ScholarLocate open access versionFindings
  • Qin, L.; Bosselut, A.; Holtzman, A.; Bhagavatula, C.; Clark, E.; and Choi, Y. 2019. Counterfactual Story Reasoning and Generation. ArXiv abs/1909.04076.
    Findings
  • Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language Models are Unsupervised Multitask Learners.
    Google ScholarFindings
  • Rashkin, H.; Bosselut, A.; Sap, M.; Knight, K.; and Choi, Y. 2018. Modeling Naive Psychology of Characters in Simple Commonsense Stories. In ACL.
    Google ScholarFindings
  • Roemmele, M.; Bejan, C. A.; and Gordon, A. S. 2011. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
    Google ScholarLocate open access versionFindings
  • Rudinger, R.; Rastogi, P.; Ferraro, F.; and Van Durme, B. 2015. Script induction as language modeling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1681–1686.
    Google ScholarLocate open access versionFindings
  • Sap, M.; Gabriel, S.; Qin, L.; Jurafsky, D.; Smith, N. A.; and Choi, Y. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5477–5490. Online: Association for Computational Linguistics. doi:10.18653/v1/2020.aclmain.486. URL https://www.aclweb.org/anthology/2020.acl-main.486.
    Locate open access versionFindings
  • Sap, M.; Le Bras, R.; Allaway, E.; Bhagavatula, C.; Lourie, N.; Rashkin, H.; Roof, B.; Smith, N. A.; and Choi, Y. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI.
    Google ScholarFindings
  • Schank, R. C.; and Abelson, R. P. 1977.
    Google ScholarFindings
  • Shannon, C. E. 1948. The Mathematical Theory of Communication.
    Google ScholarFindings
  • Speer, R.; Chin, J.; and Havasi, C. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In ThirtyFirst AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Tulving, E.; and Donaldson, W. 1972. Episodic and semantic memory. Organization of memory.
    Google ScholarFindings
  • Williams, B.; Lieberman, H.; and Winston, P. H. 2017. Understanding Stories with Large-Scale Common Sense. In COMMONSENSE.
    Google ScholarFindings
  • Winograd, T. 1972. Understanding natural language. Cognitive psychology 3(1): 1–191.
    Google ScholarLocate open access versionFindings
  • Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; and Brew, J. 2019. HuggingFace’s Transformers: State-of-theart Natural Language Processing. ArXiv abs/1910.03771.
    Findings
  • Zhang, S.; Rudinger, R.; Duh, K.; and Durme, B. V. 2017. Ordinal Common-sense Inference. Transactions of the Association for Computational Linguistics 5: 379–395.
    Google ScholarLocate open access versionFindings
  • Zhang, Z.; Wu, Y.-W.; Hai, Z.; Li, Z.; Zhang, S.; Zhou, X.; and Zhou, X. 2019. Semantics-aware BERT for Language Understanding. ArXiv abs/1909.02209.
    Findings
  • Zhou, B.; Khashabi, D.; Ning, Q.; and Roth, D. 2019. Going on a vacation takes longer than Going for a walk: A Study of Temporal Commonsense Understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3354–3360.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments