AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show the results when purely trained on Gutenberg books as well

What time is it? Temporal Analysis of Novels

EMNLP 2020, pp.9076-9086, (2020)

Cited by: 0|Views122
Full Text
Bibtex
Weibo

Abstract

Recognizing the flow of time in a story is a crucial aspect of understanding it. Prior work related to time has primarily focused on identifying temporal expressions or relative sequencing of events, but here we propose computationally annotating each line of a book with wall clock times, even in the absence of explicit time-descriptive p...More

Code:

Data:

0
Introduction
  • The flow of time is an indispensable guide for the actions, and provides a framework in which to see a logical progression of events.
  • In most works of fiction, the events of the story take place during recognizable time periods over the course of the day.
  • The authors try to capture the flow of time through novels by attempting to recognize what time of day each event in the story takes place at.
  • Figure 1 presents the work of two human annotators, independently making their best guesses of the clock time at every paragraph of the book.
  • The times identified by the annotators are shown in blue, while the model’s time predictions are shown in red
Highlights
  • The flow of time is an indispensable guide for our actions, and provides a framework in which to see a logical progression of events
  • We use “The Great Gatsby” (Fitzgerald, 1925), a short novel with a familiar plot that can be analyzed with our techniques
  • We describe our data collection process, how we extracted temporal expressions along with some analysis of the phrases (Section 3)
  • For the sake of completeness, we present results in this paper that use Gutenberg and HathiTrust books independently
  • We include results when running these models purely on Gutenberg data as well as the results when running these models with the HathiTrust data
  • The predictive power of the model did not significantly improve past this window size
  • We show the results when purely trained on Gutenberg books as well
Results
  • The results are shown in Table 3 with the metrics of accuracy and F1 scores for each class.
  • The authors see that by adding more data to the LSTM and BERT models, the average error improves significantly and unsurprisingly, the naive Bayes model only improves slightly.
  • The authors' results for average error in hours are shown in Table 7.
  • One might ask why the error is higher compared to the 24 hour model
  • This is due to the fact that while the model performs well on local windows that contain a time reference, the neighboring windows tend to give little signal about time and confuses which windows should be emphasized more than others.
  • The authors group the authors by year of birth, separated every 20 years
Conclusion
  • Conclusions and Future Work

    The authors have constructed a dataset of time phrases to build models that can predict the most relevant hour of the day for a given text window.
  • The authors' models are a good start, but the authors release the dataset to encourage others to improve on this task.
  • The authors note that this dataset can be further cleaned by resolving OCR errors in the source text as well as improving upon the time extraction algorithm.
  • More annotations of complete novels would permit better models and evaluation.
  • Future work includes applying time inference models to question answering and other NLP systems
Summary
  • Introduction:

    The flow of time is an indispensable guide for the actions, and provides a framework in which to see a logical progression of events.
  • In most works of fiction, the events of the story take place during recognizable time periods over the course of the day.
  • The authors try to capture the flow of time through novels by attempting to recognize what time of day each event in the story takes place at.
  • Figure 1 presents the work of two human annotators, independently making their best guesses of the clock time at every paragraph of the book.
  • The times identified by the annotators are shown in blue, while the model’s time predictions are shown in red
  • Objectives:

    Given a sequence of sentence windows s1, s2, . . . , sn and the number of segments, parameterized as k, the goal is to generate the most likely list of indices i1, i2, . . . , ik that represent the start of each segment, and a list of hours h1, h2, . . . , hk that represents the corresponding hour assigned to each segment.
  • Given a sequence of sentence windows s1, s2, .
  • Sn and the number of segments, parameterized as k, the goal is to generate the most likely list of indices i1, i2, .
  • Ik that represent the start of each segment, and a list of hours h1, h2, .
  • Hk that represents the corresponding hour assigned to each segment
  • Results:

    The results are shown in Table 3 with the metrics of accuracy and F1 scores for each class.
  • The authors see that by adding more data to the LSTM and BERT models, the average error improves significantly and unsurprisingly, the naive Bayes model only improves slightly.
  • The authors' results for average error in hours are shown in Table 7.
  • One might ask why the error is higher compared to the 24 hour model
  • This is due to the fact that while the model performs well on local windows that contain a time reference, the neighboring windows tend to give little signal about time and confuses which windows should be emphasized more than others.
  • The authors group the authors by year of birth, separated every 20 years
  • Conclusion:

    Conclusions and Future Work

    The authors have constructed a dataset of time phrases to build models that can predict the most relevant hour of the day for a given text window.
  • The authors' models are a good start, but the authors release the dataset to encourage others to improve on this task.
  • The authors note that this dataset can be further cleaned by resolving OCR errors in the source text as well as improving upon the time extraction algorithm.
  • More annotations of complete novels would permit better models and evaluation.
  • Future work includes applying time inference models to question answering and other NLP systems
Tables
  • Table1: Number of time examples by hour. The most frequent explicit time references are to noon and midnight
  • Table2: Top three hours for select feature words, consistent with common experience
  • Table3: AM - PM Prediction Results for Gutenberg / HathiTrust token. In some models such as BERT, we use this while tokenizing. In other models such as GloVe + LSTM, we create a new random vector to represent this token
  • Table4: Agreement between Annotators and Model for AM/PM prediction tially outperform the baseline. In the end, we use the winning BERT model to label our unlabeled data for training
  • Table5: Time-of-day prediction error by hour for HathiTrust books hours. Thus, the worst possible error is 12 hours on a 24 hour clock. A baseline model with random guessing would have an expected error of 6 hours
  • Table6: Average time-of-day prediction error for Gutenberg and HathiTrust books
  • Table7: Book Time Prediction Results across each window and assigns to each partition the hour that corresponds to the maximal sum in the summed partition probabilities
Download tables as Excel
Related work
Funding
  • This work was partially supported by NSF grants IIS1926751, IIS-1927227, and IIS-1546113
Reference
  • David Ahn. 2006. The stages of event extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, pages 1–8.
    Google ScholarLocate open access versionFindings
  • James F Allen. 1983. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843.
    Google ScholarLocate open access versionFindings
  • Cecilia Ovesdotter Alm and Richard Sproat. 2005. Emotional sequencing and development in fairy tales. In International Conference on Affective Computing and Intelligent Interaction, pages 668–674. Springer.
    Google ScholarLocate open access versionFindings
  • Steven Bethard and James H Martin. 2007. Cutmp: Temporal relation classification using syntactic and semantic features. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pages 129–132.
    Google ScholarLocate open access versionFindings
  • Philip Bramsen, Pawan Deshpande, Yoong Keok Lee, and Regina Barzilay. 2006. Finding temporal order in discharge summaries. In AMIA annual symposium proceedings, volume 2006, page 81. American Medical Informatics Association.
    Google ScholarLocate open access versionFindings
  • Bertram C Bruce. 1972. A model for temporal references and its appliicationin a question answering program.
    Google ScholarFindings
  • Nathanael Chambers, Shan Wang, and Dan Jurafsky. 200Classifying temporal relations between events. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pages 173–176.
    Google ScholarLocate open access versionFindings
  • Angel X Chang and Christopher D Manning. 2012. Sutime: A library for recognizing and normalizing time expressions. In Lrec, volume 2012, pages 3735– 3740.
    Google ScholarLocate open access versionFindings
  • Franciska De Jong, Henning Rode, and Djoerd Hiemstra. 2005. Temporal language models for the disclosure of historical text. In Humanities, computers and cultural heritage: Proceedings of the XVIth International Conference of the Association for History and Computing (AHC 2005), pages 161– 168. Koninklijke Nederlandse Academie van Wetenschappen.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • David DiLaura. 2008. A brief history of lighting. Optics and Photonics News, 19(9):22–28.
    Google ScholarLocate open access versionFindings
  • Micha Elsner. 20Character-based kernels for novelistic plot structure. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 634–644. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. A language-independent neural network for event detection. Science China Information Sciences, 61(9):092106.
    Google ScholarLocate open access versionFindings
  • Frank Fischer and Jannik Strotgen. 2015. When does (german) literature take place? on the analysis of temporal expressions in large corpora. Proceedings of Digital Humanities (DH 2015).
    Google ScholarLocate open access versionFindings
  • F. Scott Fitzgerald. 1925. The Great Gatsby. Scribner.
    Google ScholarFindings
  • Anne Garcia-Fernandez, Anne-Laure Ligozat, Marco Dinarelli, and Delphine Bernhard. 2011. When was it written? automatically determining publication dates. In International Symposium on String Processing and Information Retrieval, pages 221–236. Springer.
    Google ScholarLocate open access versionFindings
  • Project Gutenberg. n.d. www.gutenberg.org. Accessed: May 2020.
    Findings
  • Adam Jatowt, Ching-Man Au Yeung, and Katsumi Tanaka. 2013. Estimating document focus time. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2273–2278.
    Google ScholarLocate open access versionFindings
  • Matthew L Jockers. 2015. Revealing sentiment and plot arcs with the syuzhet package. Matthew L. Jockers.
    Google ScholarLocate open access versionFindings
  • Mirella Lapata and Alex Lascarides. 2006. Learning sentence-internal temporal relations. Journal of Artificial Intelligence Research, 27:85–117.
    Google ScholarLocate open access versionFindings
  • Qi Li, Heng Ji, and Liang Huang. 2013. Joint event extraction via structured prediction with global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 73–82.
    Google ScholarLocate open access versionFindings
  • Shasha Liao and Ralph Grishman. 2010. Using document level cross-event inference to improve event extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 789–797. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Machine learning of temporal relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 753–760. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Inderjeet Mani and George Wilson. 2000. Robust temporal processing of news. In Proceedings of the 38th annual meeting of the association for computational linguistics, pages 69–76.
    Google ScholarLocate open access versionFindings
  • Lara McConnaughey, Jennifer Dai, and David Bamman. 2017. The labeled segmentation of printed books. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 737–747.
    Google ScholarLocate open access versionFindings
  • Congmin Min, Munirathnam Srikanth, and Abraham Fowler. 2007. Lcc-te: a hybrid approach to temporal relation identification in news text. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 219–222. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Saif Mohammad. 2013. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. arXiv preprint arXiv:1309.5909.
    Findings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky, Jose M Castano, Robert Ingria, Roser Sauri, Robert J Gaizauskas, Andrea Setzer, Graham Katz, and Dragomir R Radev. 2003. Timeml: Robust specification of event and temporal expressions in text. New directions in question answering, 3:28–34.
    Google ScholarLocate open access versionFindings
  • James Pustejovsky, Robert Knippen, Jessica Littman, and Roser Saurı. 2005. Temporal and event information in natural language text. Language resources and evaluation, 39(2-3):123–164.
    Google ScholarLocate open access versionFindings
  • Peng Qi, Timothy Dozat, Yuhao Zhang, and Christopher D Manning. 2019. Universal dependency parsing from scratch. arXiv preprint arXiv:1901.10457.
    Findings
  • Andrew J Reagan, Lewis Mitchell, Dilan Kiley, Christopher M Danforth, and Peter Sheridan Dodds. 2016. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science, 5(1):31.
    Google ScholarLocate open access versionFindings
  • Andrea Setzer and Robert J Gaizauskas. 2000. Annotating events and temporal information in newswire texts. Citeseer.
    Google ScholarFindings
  • Matthew Sims, Jong Ho Park, and David Bamman. 2019. Literary event detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3623–3634.
    Google ScholarLocate open access versionFindings
  • Jannik Strotgen and Michael Gertz. 2010. Heideltime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 321–324. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ted Underwood. Metadata for english-language literature in hathitrust digital library beyond 1923.
    Google ScholarFindings
  • Naushad UzZaman and James F Allen. 2010. Event and temporal expression extraction from raw text: First step towards a temporally aware system. International Journal of Semantic Computing, 4(04):487–508.
    Google ScholarLocate open access versionFindings
Author
Allen Kim
Allen Kim
Charuta Pethe
Charuta Pethe
Your rating :
0

 

Tags
Comments
小科