AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We proposed a temporal annotation scheme called temporal dependency graphs which extend previous research on temporal dependency trees

Annotating Temporal Dependency Graphs via Crowdsourcing

EMNLP 2020, pp.5368-5380, (2020)

Cited by: 0|Views122
Full Text
Bibtex
Weibo

Abstract

We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that ac...More

Code:

Data:

0
Introduction
  • Understanding temporal relations between events in a text is an important part of understanding the “meaning” of text.
  • The largest data set that the authors are aware of is the data set used in TempEval-3 (UzZaman et al, 2013), and it consists of 276 articles from the TimeBank Corpus (Pustejovsky et al, 2003b) and the AQUAINT Corpus
  • This data set was later re-annotated by (Ning et al, 2018) to improve its annotation consistency using a crowdsourcing approach.
Highlights
  • Understanding temporal relations between events in a text is an important part of understanding the “meaning” of text
  • We extend the temporal dependency tree to temporal dependency graph (TDG), allowing each event to have a reference time expression, a reference event, or both
  • We investigate the feasibility of annotating TDGs from scratch via crowdsourcing, meaning we start with identifying events and time expressions, and annotate the temporal relations between them
  • We proposed a temporal annotation scheme called temporal dependency graphs which extend previous research on temporal dependency trees
  • We proposed a crowdsourcing strategy and demonstrated its feasibility with a comparative analysis of the quality of the annotation
  • To reduce the number of candidate time expressions and simplify this task, for each event, we present crowd workers with time expressions in the same paragraph as well as time expressions in the first paragraph as its candidate reference timexes. To participate in this task, crowd workers need to achieve 70% accuracy on the qualifying test
  • We demonstrated the utility of the data set by training a neural ranking model on this data set, and the data set is publicly available
Methods
  • The authors test the data with an attention-based neural ranking temporal dependency parser6 that Zhang and Xue (2018a) developed for TDT, which parses the temporal dependency tree by ranking the candidate parents for each node.
  • To help the model learn the relations between DCT and events, a POS tag feature is added which only distinguishes present tense verb events with other events
  • This feature is represented as a one hot vector.
  • The test data is annotated by experts, and the validation data is generated from crowdsourced annotation as follows: if there is no agreement for one question, i.e
Results
  • After the first subtask was completed, the authors found that only for fewer than 200 time expressions, their reference time is not DCT.
  • To reduce the number of candidate time expressions and simplify this task, for each event, the authors present crowd workers with time expressions in the same paragraph as well as time expressions in the first paragraph as its candidate reference timexes.
  • To participate in this task, crowd workers need to achieve 70% accuracy on the qualifying test
Conclusion
  • The authors proposed a temporal annotation scheme called temporal dependency graphs which extend previous research on temporal dependency trees.
  • The temporal dependency graphs, like temporal dependency trees, draw inspiration from previous research on narrative times and temporal anaphora, allow a good trade-off between completeness and practicality in temporal annotation.
  • The authors proposed a crowdsourcing strategy and demonstrated its feasibility with a comparative analysis of the quality of the annotation.
  • The authors demonstrated the utility of the data set by training a neural ranking model on this data set, and the data set is publicly available
Summary
  • Introduction:

    Understanding temporal relations between events in a text is an important part of understanding the “meaning” of text.
  • The largest data set that the authors are aware of is the data set used in TempEval-3 (UzZaman et al, 2013), and it consists of 276 articles from the TimeBank Corpus (Pustejovsky et al, 2003b) and the AQUAINT Corpus
  • This data set was later re-annotated by (Ning et al, 2018) to improve its annotation consistency using a crowdsourcing approach.
  • Methods:

    The authors test the data with an attention-based neural ranking temporal dependency parser6 that Zhang and Xue (2018a) developed for TDT, which parses the temporal dependency tree by ranking the candidate parents for each node.
  • To help the model learn the relations between DCT and events, a POS tag feature is added which only distinguishes present tense verb events with other events
  • This feature is represented as a one hot vector.
  • The test data is annotated by experts, and the validation data is generated from crowdsourced annotation as follows: if there is no agreement for one question, i.e
  • Results:

    After the first subtask was completed, the authors found that only for fewer than 200 time expressions, their reference time is not DCT.
  • To reduce the number of candidate time expressions and simplify this task, for each event, the authors present crowd workers with time expressions in the same paragraph as well as time expressions in the first paragraph as its candidate reference timexes.
  • To participate in this task, crowd workers need to achieve 70% accuracy on the qualifying test
  • Conclusion:

    The authors proposed a temporal annotation scheme called temporal dependency graphs which extend previous research on temporal dependency trees.
  • The temporal dependency graphs, like temporal dependency trees, draw inspiration from previous research on narrative times and temporal anaphora, allow a good trade-off between completeness and practicality in temporal annotation.
  • The authors proposed a crowdsourcing strategy and demonstrated its feasibility with a comparative analysis of the quality of the annotation.
  • The authors demonstrated the utility of the data set by training a neural ranking model on this data set, and the data set is publicly available
Tables
  • Table1: Temporal relations used in TDT
  • Table2: Events, time expressions and temporal relations in various corpora
  • Table3: Distribution of temporal relations between events and events. RE refers to reference event
  • Table4: Agreement F1 and WAWA for time expression identification (ID), time expression reference time (RT) identification and event identification
  • Table5: Agreement F1 for reference timex (RT) and reference event (RE) identification for events. The third column evaluates the labeled (L) annotation, the fourth column evaluates the unlabeled (U) annotation
  • Table6: Relation only annotation agreement
  • Table7: Distribution of the cause of the wrong annotations in the 100 sampled instances
  • Table8: Experiment results of the baseline system and the neural ranking model
Download tables as Excel
Related work
  • 7.1 Temporal Dependency Structure

    Kolomiyets et al (2012) are the first work that use the term temporal dependencies, and they extract timelines from narrative stories as temporal dependency trees. However, in their work, only events are included as nodes in the dependency tree, and the parent of each node is not explicitly defined as the reference event of the child event. Zhang and Xue (2018b) first defined a temporal dependency tree structure that have both events and time expressions as nodes in the tree, and attempted to explicitly define the parent of each event or time expression as the reference event or time expression of the child node. This temporal dependency tree has been applied to both Chinese (Zhang and Xue, 2018b) and English (Zhang and Xue, 2019) data, and to both news reports and narrative stories, indicating this framework can be applied across languages and genres. The present work extends temporal dependency trees to the temporal dependency graphs, and crowd-sourced temporal dependency graphs on English news articles.

    7.2 Crowdsourcing Temporal Relations

    Early studies on crowdsourcing temporal relations usually focus on some subtask of this problem. Snow et al (2008) crowdsources the relations of a subset of verb event pairs from TimeBank (Pustejovsky et al, 2003b) whose relations are either “strictly before” or “strictly after”. Ng and Kan (2012) only focuses on the relation between events and time expressions from news data. Caselli et al (2016) conducts crowdsourcing experiments on both temporal relation annotation and event / time expression extraction. In the time expression extraction experiments, they ask crowd workers to select time expressions directly from the raw text. In contrast, we give crowd workers time expression candidates and ask them binary questions. Our approach prevents crowd workers from selecting wrong textual spans. Ning et al (2018) comes up with a multi-axis approach for event temporal relation annotation (see Ning et al, 2018 Section 2 and Appendix A for more details about their multi-axis model). The multi-axis approach is a way of factoring out modalities in event annotation, and combined with the decision to only consider the start point of events, they are able to achieve high accuracy in annotating temporal relations assuming gold events are provided. Our annotation is more challenging in that crowdworkers also need to identify time expressions and events, in addition to annotating temporal relations.
Funding
  • This work is supported in part by a grant from the IIS Division of National Science Foundation (Award No 1763926) entitled “Building a Uniform Meaning Representation for Natural Language Processing” awarded to the fourth author
  • All views expressed in this paper are those of the authors and do not necessarily represent the view of the National Science Foundation. This work was supported in part by DARPA/I2O and U.S Army Research Office Contract No W911NF-18-C-0003 under the World Modelers program, and the Office of the Director of National Intelligence (ODNI) and Intelligence Advanced Research Projects Activity (IARPA) via IARPA Contract No 2019-19051600006 under the BETTER program
Study subjects and analysis
Wikinews articles: 500
. We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that achieves a good balance between completeness and practicality in temporal annotation

documents: 500
We use the same hyperparameter values as Zhang and Xue (2018a). In the 500 documents, 400 are used as training data, 50 as validation data, and 50 as test data. The test data is annotated by experts, and the validation data is generated from crowdsourced annotation as follows: if there is no agreement for one question, i.e

articles: 276
Even though the first temporal annotation scheme, TimeML (Pustejovsky et al, 2003a; Saurı et al, 2006), was proposed over a decade ago, temporally annotated data is still relatively scarce. The largest data set that we are aware of is the data set used in TempEval-3 (UzZaman et al, 2013), and it consists of 276 articles from the TimeBank Corpus (Pustejovsky et al, 2003b) and the AQUAINT Corpus. This data set was later re-annotated by (Ning et al, 2018) to improve its annotation consistency using a crowdsourcing approach

articles: 36
Even with this restriction, it is still difficult to produce a large data set. For example, there are only 36 articles in TimeBank-. Dense (Cassidy et al, 2014)

Wikinews articles with this approach: 500
We show that with a carefully designed annotation strategy, annotating TDGs via crowdsourcing is feasible. We annotated a corpus of 500 Wikinews articles with this approach, and created the largest corpus in terms of the number of articles and the number of event or time expression pairs. The remainder of the paper is organized as follows

crowd workers: 3
After all annotation steps are completed, we assemble the TDG for each text, and an example that illustrates the step-by-step construction of the TDG of a text is provided in Figure 3. In all steps, each annotation is completed by three crowd workers using the Amazon Mechanic Turk platform. Unless otherwise specified, the majority-voted answer is designated as the gold annotation

crowd workers: 3
Specifically, we compute the average accuracy of each worker and create a “best workers” group which consists of the crowd workers whose average accuracy is above 0.7. Then, for each question, if the three crowd workers give the same answer, that answer becomes the gold answer. Otherwise, if one crowd worker is in the “best workers” group, his/her answer becomes the gold answer; else, the majority answer is the gold answer

news articles: 500
MATRES annotates verb events on the main axis and orthogonal axes (see Ning et al, 2018 for their axis types), and does not annotate the relations between events and time expressions. Compared to the four TimeBankbased corpora, our corpus is much larger on every count, with 500 news articles, 14,974 events, 2,485 time expressions, and 28,350 temporal relations. TimeBank TB-Dense MATRES TDT-Crd This work

documents: 500
We use the same hyperparameter values as Zhang and Xue (2018a). In the 500 documents, 400 are used as training data, 50 as validation data, and 50 as test data. The test data is annotated by experts, and the validation data is generated from crowdsourced annotation as follows: if there is no agreement for one question, i.e

crowd workers: 3
The test data is annotated by experts, and the validation data is generated from crowdsourced annotation as follows: if there is no agreement for one question, i.e. 6https://github.com/yuchenz/tdp_ ranking three crowd workers chose three different answers, then experts annotate that question. We also develop a heuristic baseline system as follows

Reference
  • Steven Bethard, James H Martin, and Sara Klingenstein. 2007. Timelines from text: Identification of syntactic temporal relations. In International Conference on Semantic Computing (ICSC 2007), pages 11–18. IEEE.
    Google ScholarLocate open access versionFindings
  • Jurgen Bohnemeyer. 2009. Temporal anaphora in a tenseless language. The expression of time in language, pages 83–128.
    Google ScholarLocate open access versionFindings
  • Tommaso Caselli, Rachele Sprugnoli, and Oana Inel. 2016. Temporal information annotation: Crowd vs. experts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3502–3509, Portoroz, Slovenia. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. 201An annotation framework for dense event ordering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 501–506, Baltimore, Maryland. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 740–750, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Erhard Hinrichs. 198Temporal anaphora in discourses of english. Linguistics and philosophy, pages 63–82.
    Google ScholarLocate open access versionFindings
  • Oleksandr Kolomiyets, Steven Bethard, and MarieFrancine Moens. 2012. Extracting narrative timelines as temporal dependency structures. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 88–97, Jeju Island, Korea. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60.
    Google ScholarLocate open access versionFindings
  • Jun-Ping Ng and Min-Yen Kan. 2012. Improved temporal relation classification using dependency parses and selective crowdsourced annotations. In Proceedings of COLING 2012, pages 2109–2124, Mumbai, India. The COLING 2012 Organizing Committee.
    Google ScholarLocate open access versionFindings
  • Qiang Ning, Hao Wu, and Dan Roth. 2018. A multiaxis annotation scheme for event temporal relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1318–1328, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Barbara H Partee. 1984. Nominal and temporal anaphora. Linguistics and philosophy, pages 243– 286.
    Google ScholarLocate open access versionFindings
  • Barbara Hall Partee. 1973. Some structural analogies between tenses and pronouns in english. The Journal of Philosophy, 70(18):601–609.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky, Jose M Castano, Robert Ingria, Roser Sauri, Robert J Gaizauskas, Andrea Setzer, Graham Katz, and Dragomir R Radev. 2003a. Timeml: Robust specification of event and temporal expressions in text. New directions in question answering, 3:28–34.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky, Patrick Hanks, Roser Sauri, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, et al. 2003b. The timebank corpus. In Corpus linguistics, volume 2003, page 40.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky and Amber Stubbs. 2011. Increasing informativeness in temporal annotation. In Proceedings of the 5th Linguistic Annotation Workshop, pages 152–160. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hans Reichenbach. 1949. Elements of symbolic logic.
    Google ScholarFindings
  • Nils Reimers, Nazanin Dehghani, and Iryna Gurevych. 2016. Temporal anchoring of events for the timebank corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2195– 2204.
    Google ScholarLocate open access versionFindings
  • Roser Saurı, Jessica Littman, Bob Knippen, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky. 2006. Timeml annotation guidelines. Version, 1(1):31.
    Google ScholarLocate open access versionFindings
  • Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 254–263, Honolulu, Hawaii. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jannik Strotgen and Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2):269–298.
    Google ScholarLocate open access versionFindings
  • Naushad UzZaman, Hector Llorens, James F. Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. 2012. Tempeval-3: Evaluating events, time expressions, and temporal relations. CoRR, abs/1206.5333.
    Findings
  • Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, and James Pustejovsky. 2013. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 1–9.
    Google ScholarLocate open access versionFindings
  • Bonnie Lynn Webber. 1988. Tense as discourse anaphor. Computational Linguistics, 14(2):61–73.
    Google ScholarLocate open access versionFindings
  • Yuchen Zhang and Nianwen Xue. 2018a. Neural ranking models for temporal dependency structure parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3339–3349, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yuchen Zhang and Nianwen Xue. 2018b. Structured interpretation of temporal relations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Yuchen Zhang and Nianwen Xue. 2019. Acquiring structured temporal representation via crowdsourcing: A feasibility study. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), pages 178–185, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Author
Jiarui Yao
Jiarui Yao
Haoling Qiu
Haoling Qiu
Your rating :
0

 

Tags
Comments
小科