Temporal Common Sense Acquisition with Minimal Supervision

ACL, pp. 7579-7589, 2020.

Cited by: 2|Bibtex|Views92
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Despite the existence of several prior work on event duration, this is the first attempt to jointly model three key dimensions of temporal common sense—duration, frequency, and typical time— from cheap supervision signals mined from unannotated free text

Abstract:

Temporal common sense (e.g., duration and frequency of events) is crucial for understanding natural language. However, its acquisition is challenging, partly because such information is often not expressed explicitly in text, and human annotation on such concepts is costly. This work proposes a novel sequence modeling approach that expl...More

Code:

Data:

0
Introduction
  • It is important to understand time as expressed in natural language text.
  • Understanding time in natural language text heavily relies on common sense inference.
  • Such inference is challenging since commonsense information is rarely made explicit in text Even when such information is mentioned, it is often.
  • Dr Porter is taking a walk.
  • Dr Porter is taking his weekend break
Highlights
  • Time is crucial when describing the evolving world
  • Understanding time in natural language text heavily relies on common sense inference
  • We find the majority of the original annotation to be unsuitable, as there are many annotations to events that are seemingly undecidable by common sense
  • Despite the existence of several prior work on event duration, this is the first attempt to jointly model three key dimensions of temporal common sense—duration, frequency, and typical time— from cheap supervision signals mined from unannotated free text
  • The proposed sequence modeling framework improves over BERT in terms of handling reporting bias, taking into account the ordinal relations and exploiting interactions among multiple dimensions of time
  • The success of this model is confirmed by intrinsic evaluations on RealNews and UDS-T, as well as extrinsic evaluations on TimeBank, HiEVE and MCTACO
Methods
  • The authors experiment with several variants of the proposed system to study the effect of each change.
  • A model with three input sentences are labeled with MS.
  • Non MS models use only one sentence in which the event occurs.
  • A model with pevent = 0.6 is labeled as AM, and pevent = 0.15 otherwise.
  • Final Model.
  • The authors' final model includes all auxiliary dimensions (AUX), uses soft cross-entropy loss (SL) and applies weight adjustment (ADJ).
  • The authors study each changes’ effect by ablating them individually
Results
  • Results on MCTACO are shown in

    Table 4.
  • Results on MCTACO are shown in.
  • Table 4.
  • The authors find that the model achieves better System Duration Ordering Stationarity.
  • Frequency performance on the three dimensions that are focused in this work as well as stationarity.
  • The improvements are not very substantial, indicating the difficulty of this task and motivates future works.
  • The model does slightly worse on ordering, which is worth investigating in future works
Conclusion
  • Temporal common sense (TCS) is an important yet challenging research topic. Despite the existence of several prior work on event duration, this is the first attempt to jointly model three key dimensions of TCS—duration, frequency, and typical time— from cheap supervision signals mined from unannotated free text.
  • The proposed sequence modeling framework improves over BERT in terms of handling reporting bias, taking into account the ordinal relations and exploiting interactions among multiple dimensions of time.
  • The success of this model is confirmed by intrinsic evaluations on RealNews and UDS-T, as well as extrinsic evaluations on TimeBank, HiEVE and MCTACO.
  • The proposed method may be an important module for future applications related to time
Summary
  • Introduction:

    It is important to understand time as expressed in natural language text.
  • Understanding time in natural language text heavily relies on common sense inference.
  • Such inference is challenging since commonsense information is rarely made explicit in text Even when such information is mentioned, it is often.
  • Dr Porter is taking a walk.
  • Dr Porter is taking his weekend break
  • Methods:

    The authors experiment with several variants of the proposed system to study the effect of each change.
  • A model with three input sentences are labeled with MS.
  • Non MS models use only one sentence in which the event occurs.
  • A model with pevent = 0.6 is labeled as AM, and pevent = 0.15 otherwise.
  • Final Model.
  • The authors' final model includes all auxiliary dimensions (AUX), uses soft cross-entropy loss (SL) and applies weight adjustment (ADJ).
  • The authors study each changes’ effect by ablating them individually
  • Results:

    Results on MCTACO are shown in

    Table 4.
  • Results on MCTACO are shown in.
  • Table 4.
  • The authors find that the model achieves better System Duration Ordering Stationarity.
  • Frequency performance on the three dimensions that are focused in this work as well as stationarity.
  • The improvements are not very substantial, indicating the difficulty of this task and motivates future works.
  • The model does slightly worse on ordering, which is worth investigating in future works
  • Conclusion:

    Temporal common sense (TCS) is an important yet challenging research topic. Despite the existence of several prior work on event duration, this is the first attempt to jointly model three key dimensions of TCS—duration, frequency, and typical time— from cheap supervision signals mined from unannotated free text.
  • The proposed sequence modeling framework improves over BERT in terms of handling reporting bias, taking into account the ordinal relations and exploiting interactions among multiple dimensions of time.
  • The success of this model is confirmed by intrinsic evaluations on RealNews and UDS-T, as well as extrinsic evaluations on TimeBank, HiEVE and MCTACO.
  • The proposed method may be an important module for future applications related to time
Tables
  • Table1: Performance on intrinsic evaluations. The “normalized” row is the ratio of the distance to the gold label over the total number of labels in each dimension. Smaller is better
  • Table2: Performance on TimeBank Classification
  • Table3: Performance on HiEVE. The numbers are in percentages. Higher is better
  • Table4: Performance on MCTACO. Numbers are percentages and indicate exact match (EM) metric. Higher is better
Download tables as Excel
Related work
  • Common sense has been a popular topic in recent years, and existing NLP works have mainly investigated the acquisition and evaluation of common sense reasoning in the physical world. These works include but are not limited to size, weight, and strength (Bagherinezhad et al, 2016; Forbes and Choi, 2017; Elazar et al, 2019), roundness and deliciousness (Yang et al, 2018), and intensity (Cocos et al, 2018). A handful of these works uses cheap supervision. For example, Elazar et al (2019) recently proposed a general framework that discovers distributions of quantitative attributes (e.g., length, mass, speed, and duration) from explicit mentions (or co-occurrences) of these attributes in a large corpus. However, Elazar et al (2019) restrict events to be verb tokens, while we handle verb phrases containing more detailed information (e.g., “taking a vacation” is very different from “taking a break,” although they share the same verb “take”). Besides, there has been no report on the effectiveness of this method on temporal attributes.
Funding
  • This research is based upon work supported in part by the office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No 2019-19051600006 under the BETTER Program and by Contract FA8750-19-2-1004 with the US Defense Advanced Research Projects Agency (DARPA)
  • This research is also supported by a grant from the Allen Institute for Artificial Intelligence (allenai.org)
Reference
  • Gabor Angeli, Christopher D Manning, and Daniel Jurafsky. 2012. Parsing time: Learning to interpret time expressions. In NAACL-HLT, pages 446–455.
    Google ScholarLocate open access versionFindings
  • Hessam Bagherinezhad, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi. 2016. Are elephants bigger than butterflies? reasoning about sizes of objects. In AAAI.
    Google ScholarFindings
  • Lisa Bauer, Yicheng Wang, and Mohit Bansal. 2018. Commonsense for generative multi-hop question answering tasks. In EMNLP, pages 4220–4230.
    Google ScholarLocate open access versionFindings
  • Steven Bethard, Guergana Savova, Wei-Te Chen, Leon Derczynski, James Pustejovsky, and Marc Verhagen. 2016. SemEval-2016 Task 12: Clinical TempEval. In SemEval, pages 1052–1062.
    Google ScholarLocate open access versionFindings
  • Nathanael Chambers, Taylor Cassidy, Bill McDowell, and Steven Bethard. 2014. Dense event ordering with a multi-pass architecture. TACL, 2:273–284.
    Google ScholarLocate open access versionFindings
  • Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In EMNLP, pages 33–40.
    Google ScholarLocate open access versionFindings
  • Anne Cocos, Veronica Wharton, Ellie Pavlick, Marianna Apidianaki, and Chris Callison-Burch. 2018. Learning scalar adjective intensity from paraphrases. In EMNLP, pages 1752–1762.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, and Dan Roth. 201How large are lions? inducing distributions over quantitative attributes. In ACL.
    Google ScholarFindings
  • Maxwell Forbes and Yejin Choi. 2017. Verb physics: Relative physical knowledge of actions and objects. In ACL, volume 1, pages 266–276.
    Google ScholarLocate open access versionFindings
  • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. Allennlp: A deep semantic natural language processing platform. In NLP-OSS, pages 1–6.
    Google ScholarLocate open access versionFindings
  • Goran Glavas, Jan Snajder, Marie-Francine Moens, and Parisa Kordjamshidi. 2014. HiEve: A corpus for extracting event hierarchies from news stories. In LREC, pages 3678–3683, Reykjavik, Iceland. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Jonathan Gordon and Benjamin Van Durme. 20Reporting bias and knowledge acquisition. In AKBC, pages 25–30. ACM.
    Google ScholarLocate open access versionFindings
  • David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2003. English gigaword. Linguistic Data Consortium, Philadelphia, 4(1):34.
    Google ScholarLocate open access versionFindings
  • Mark Granroth-Wilding and Stephen Christopher Clark. 2016. What happens next? event prediction using a compositional neural network model. In ACL.
    Google ScholarFindings
  • Andrey Gusev, Nathanael Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, and Dan Jurafsky. 2011. Using query patterns to learn the duration of events. In IWCS, pages 145–154.
    Google ScholarLocate open access versionFindings
  • Luyao Huang, Chi Sun, Xipeng Qiu, and Xuanjing Huang. 2019. GlossBERT: BERT for word sense disambiguation with gloss knowledge. In EMNLP, pages 3507–3512.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Yoav Artzi, Jesse Dodge, and Luke Zettlemoyer. 2014. Context-dependent semantic parsing for time expressions. In ACL (1), pages 1437–1447.
    Google ScholarLocate open access versionFindings
  • Artuur Leeuwenberg and Marie-Francine Moens. 2017. Structured learning for temporal relation extraction from clinical records. In EACL.
    Google ScholarFindings
  • Artuur Leeuwenberg and Marie-Francine Moens. 2018. Temporal information extraction by predicting relative time-lines. EMNLP.
    Google ScholarLocate open access versionFindings
  • Zhongyang Li, Xiao Ding, and Ting Liu. 2018. Constructing narrative event evolutionary graph for script event prediction. IJCAI.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Hector Llorens, Nathanael Chambers, Naushad UzZaman, Nasrin Mostafazadeh, James Allen, and James Pustejovsky. 2015. SemEval-2015 Task 5: QA TEMPEVAL - evaluating temporal information understanding with question answering. In SemEval, pages 792–800.
    Google ScholarLocate open access versionFindings
  • Qiang Ning, Zhili Feng, and Dan Roth. 2017. A structured learning approach to temporal relation extraction. In EMNLP.
    Google ScholarFindings
  • Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth. 2018a. Joint reasoning for temporal and causal relations. In ACL.
    Google ScholarFindings
  • Qiang Ning, Hao Wu, Haoruo Peng, and Dan Roth. 2018b. Improving temporal relation extraction with a globally acquired statistical resource. In NAACL, pages 841–851.
    Google ScholarLocate open access versionFindings
  • Qiang Ning, Ben Zhou, Zhili Feng, Haoruo Peng, and Dan Roth. 2018c. CogCompTime: A tool for understanding time in natural language. In EMNLP.
    Google ScholarLocate open access versionFindings
  • Feng Pan, Rutu Mulkar, and Jerry R Hobbs. 2006. Extending TimeML with typical durations of events. In ARTE, pages 38–45. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Haoruo Peng, Qiang Ning, and Dan Roth. 2019. KnowSemLM: A Knowledge Infused Semantic Language Model. In CoNLL.
    Google ScholarFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT, pages 2227–2237.
    Google ScholarLocate open access versionFindings
  • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In NeurIPS, pages 9054–9065. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Sheng Zhang, Rachel Rudinger, Kevin Duh, and Benjamin Van Durme. 2017. Ordinal common-sense inference. TACL, 5(1):379–395.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky, Patrick Hanks, Roser Sauri, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, et al. 2003. The TIMEBANK corpus. In Corpus linguistics, volume 2003, page 40.
    Google ScholarLocate open access versionFindings
  • Ben Zhou, Daniel Khashabi, Qiang Ning, and Dan Roth. 2019. ”Going on a vacation” takes longer than ”Going for a walk”: A Study of Temporal Commonsense Understanding. In EMNLP.
    Google ScholarLocate open access versionFindings
  • Roser Saurei, Jessica Littman, Bob Knippen, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky. 2005. Timeml annotation guidelines.
    Google ScholarFindings
  • Lenhart Schubert. 2002. Can we derive general world knowledge from texts? In HLT, pages 94–97. Morgan Kaufmann Publishers Inc.
    Google ScholarLocate open access versionFindings
  • Peng Shi and Jimmy Lin. 2019. Simple bert models for relation extraction and semantic role labeling. arXiv preprint arXiv:1904.05255.
    Findings
  • Jannik Strotgen and Michael Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In SemEval, pages 321–324. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Niket Tandon, Bhavana Dalvi, Joel Grus, Wen-tau Yih, Antoine Bosselut, and Peter Clark. 2018. Reasoning about actions and state changes by injecting commonsense knowledge. In EMNLP, pages 57–66.
    Google ScholarLocate open access versionFindings
  • Naushad UzZaman, Hector Llorens, James Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TEMPEVAL-3: Evaluating time expressions, events, and temporal relations. *SEM, 2:1–9.
    Google ScholarLocate open access versionFindings
  • Benjamin Van Durme. 2009. Extracting implicit knowledge from text. Ph.D. thesis, University of Rochester.
    Google ScholarFindings
  • Siddharth Vashishtha, Benjamin Van Durme, and Aaron Steven White. 2019. Fine-grained temporal relation extraction. In ACL, pages 2906–2919.
    Google ScholarLocate open access versionFindings
  • Alakananda Vempala, Eduardo Blanco, and Alexis Palmer. 2018. Determining event durations: Models and error analysis. In NAACL, volume 2, pages 164–168.
    Google ScholarLocate open access versionFindings
  • Jennifer Williams. 2012. Extracting fine-grained durations for verbs from twitter. In ACL-SRW, pages 49–54. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yiben Yang, Larry Birnbaum, Ji-Ping Wang, and Doug Downey. 2018. Extracting commonsense properties from embeddings with limited human guidance. In ACL, volume 2, pages 644–649.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments