AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We modeled the problem from the perspective of Multiple Instance Learning and developed a multi-view

Multi view Story Characterization from Movie Plot Synopses and Reviews

EMNLP 2020, pp.5629-5646, (2020)

Cited by: 0|Views143
Full Text
Bibtex
Weibo

Abstract

This paper considers the problem of characterizing stories by inferring properties such as theme and style using written synopses and reviews of movies. We experiment with a multi-label dataset of movie synopses and a tagset representing various attributes of stories (e.g., genre, type of events). Our proposed multi-view model encodes the...More

Code:

Data:

0
Introduction
  • A high-level description of stories represented by a tagset can assist consumers of story-based media during the selection process.
  • In contrast to the usual aspect based opinions, reviews of story-based items often contain end users’ feelings, important events of stories, or genre related information, which are abstract in nature and do not have a very specific target aspect
  • Extraction of such opinions about stories has been approached by previous work using reviews of movies (Zhuang et al, 2006; Li et al, 2010) and books (Lin et al, 2013).
  • While the primary task is to retrieve relevant tags from a pre-defined tagset by supervised learning, the model provides the ability to mine story aspects from reviews without any direct supervision
Highlights
  • A high-level description of stories represented by a tagset can assist consumers of story-based media during the selection process
  • The tagset is predefined by what was present in the training and development sets and is brittle; story attributes are unbounded in principle and grow with the underlying vocabulary
  • Movie Review Mining There is a subtle distinction between the reviews of typical material products and story-based items
  • We modeled the problem from the perspective of Multiple Instance Learning and developed a multi-view
  • We demonstrated that exploiting user reviews can further improve performance and experimented with several methods for combining user reviews and synopses
  • We developed an unsupervised technique to extract tags that identify complementary attributes of movies from user reviews. We believe that this coarse story understanding approach can be extended to longer stories, i.e., entire books, and are currently exploring this path in our ongoing work
Methods
  • The authors treat the tag assignment task as a multi-label classification problem. Based on P (YP |X), the authors sort the predefined tagset YP in descending order, so that tags with higher weights are ranked on top.
  • Convolutional Neural Network with Emotion Flow (CNN-EF) The authors use a Convolutional neural network-based text encoder to extract features from written synopses and Bidirectional LSTMs to model the flow of emotions in the stories (Kar et al, 2018b).
  • To the knowledge, this method is currently the best-performing system on the task
Results
  • Quantitative Results The authors report the results of the experiments on the test5 set in Table 2.
  • The authors mainly discuss the top-3 setting, where three tags are assigned to each instance by all systems.
  • Regarding the first research question, Table 2 shows that the proposed hierarchical model with attention HN(A) outperforms all comparison systems Top − 3 Top − 5 F1 TL F1 TL Synopsis to Tags Most Frequent CNN − EF SBERT HN(Maxpool) HN(A) HN(A) + MIL
Conclusion
  • The authors focused on characterizing stories by generating tags from synopses and reviews.
  • The authors' model learns to predict tags by identifying salient sentences and words from synopses and reviews.
  • The authors developed an unsupervised technique to extract tags that identify complementary attributes of movies from user reviews.
  • The authors believe that this coarse story understanding approach can be extended to longer stories, i.e., entire books, and are currently exploring this path in the ongoing work
Summary
  • Introduction:

    A high-level description of stories represented by a tagset can assist consumers of story-based media during the selection process.
  • In contrast to the usual aspect based opinions, reviews of story-based items often contain end users’ feelings, important events of stories, or genre related information, which are abstract in nature and do not have a very specific target aspect
  • Extraction of such opinions about stories has been approached by previous work using reviews of movies (Zhuang et al, 2006; Li et al, 2010) and books (Lin et al, 2013).
  • While the primary task is to retrieve relevant tags from a pre-defined tagset by supervised learning, the model provides the ability to mine story aspects from reviews without any direct supervision
  • Objectives:

    The authors aim to explore three research questions through the experiments: (Q1) for predicting tags from synopses only, can the approach outperform other machine learning models? (Q2) When available, can reviews strengthen the synopses to tag prediction model? and (Q3) how relevant are open-vocabulary tags to stories?.
  • The authors aim to explore three research questions through the experiments: (Q1) for predicting tags from synopses only, can the approach outperform other machine learning models?
  • (Q2) When available, can reviews strengthen the synopses to tag prediction model?
  • (Q3) how relevant are open-vocabulary tags to stories?
  • Methods:

    The authors treat the tag assignment task as a multi-label classification problem. Based on P (YP |X), the authors sort the predefined tagset YP in descending order, so that tags with higher weights are ranked on top.
  • Convolutional Neural Network with Emotion Flow (CNN-EF) The authors use a Convolutional neural network-based text encoder to extract features from written synopses and Bidirectional LSTMs to model the flow of emotions in the stories (Kar et al, 2018b).
  • To the knowledge, this method is currently the best-performing system on the task
  • Results:

    Quantitative Results The authors report the results of the experiments on the test5 set in Table 2.
  • The authors mainly discuss the top-3 setting, where three tags are assigned to each instance by all systems.
  • Regarding the first research question, Table 2 shows that the proposed hierarchical model with attention HN(A) outperforms all comparison systems Top − 3 Top − 5 F1 TL F1 TL Synopsis to Tags Most Frequent CNN − EF SBERT HN(Maxpool) HN(A) HN(A) + MIL
  • Conclusion:

    The authors focused on characterizing stories by generating tags from synopses and reviews.
  • The authors' model learns to predict tags by identifying salient sentences and words from synopses and reviews.
  • The authors developed an unsupervised technique to extract tags that identify complementary attributes of movies from user reviews.
  • The authors believe that this coarse story understanding approach can be extended to longer stories, i.e., entire books, and are currently exploring this path in the ongoing work
Tables
  • Table1: Statistics of the dataset. S denotes synopses and R denotes review summaries
  • Table2: Results obtained on the test set using different methodologies on the synopses and after adding reviews with the synopses. TL stands for tags learned. ∗: t-test with p-value
  • Table3: System predicted tags for movies released in 2019. The underlined tags match recently assigned tags from users in IMDb
  • Table4: Tags generated by our system for narratives that are not movie synopsis
  • Table5: Hyper-parameters and their values explored for tuning the model to achieve optimal performance on the validation data. ∗ indicates the value providing the best performance
  • Table6: Results obtained on the validation set using different methodologies on the synopses and after adding reviews with the synopses. TL stands for tags learned. ∗: t-test with p-value
  • Table7: Example of stories that are not movie synopsis and the source URL. Tags in the right column are generated by our system
  • Table8: Examples of plot synopsis and review summary for some movies
  • Table9: Data from the human evaluation experiment. B represents the tags predicted by the baseline system, N represents the tags predicted by our new system, and R represents the open set tags extracted from the user reviews by our system. If a tag is followed by a number in superscript, the number indicates the number of annotators who selected the tag as relevant to the story. We consider a tag as relevant if it has at least two votes. ♠ indicates the instances where our system’s predictions were more relevant compared to the baseline system, and ♣ indicates the opposite. For the rest of the instances, both systems had a tie. Annotators’ feedback about the helpfulness of the tagsets (closed set tags and open set tags) are presented by emoticons ( : Very helpful, : Moderately helpful, : Not helpful). First three emoticons are the feedback from all the annotators for the tags from the baseline system and our system. Rest of the three emoticons are the feedback for the tags extracted from the user reviews
Download tables as Excel
Funding
  • This work was also partially supported by NSF grant 1462141
  • Lapata acknowledges the support of ERC (award number 681760, “Translating Multiple Modalities into Text”)
Reference
  • Stefanos Angelidis and Mirella Lapata. 2018. Multiple instance learning networks for fine-grained sentiment analysis. Transactions of the Association for Computational Linguistics, 6:17–31.
    Google ScholarLocate open access versionFindings
  • John Arevalo, Thamar Solorio, Manuel Montes-yGomez, and Fabio A. Gonzalez. 2017. Gated multimodal units for information fusion. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Douglas Biber. 1992. The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities, 26(5):331–345.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas G. Dietterich, Richard H. Lathrop, and Tomas Lozano-Perez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1-2):31–71.
    Google ScholarLocate open access versionFindings
  • Philip John Gorinski and Mirella Lapata. 2018. What’s This Movie About? A Joint Neural Network Architecture for Movie Content Analysis. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1770–1781. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Sudipta Kar, Suraj Maharjan, A. Pastor Lopez-Monroy, and Thamar Solorio. 2018a. MPST: A corpus of movie plot synopses with tags. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Sudipta Kar, Suraj Maharjan, and Thamar Solorio. 2018b. Folksonomication: Predicting tags for movies from plot synopses using emotion flow encoded neural network. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2879–2891. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. Multilabel text classification for automated tag suggestion. In Proceedings of the ECML/PKDD 2008 Discovery Challenge.
    Google ScholarLocate open access versionFindings
  • Jim Keeler and David E. Rumelhart. 1992. A selforganizing integrated segmentation and recognition neural net. In J. E. Moody, S. J. Hanson, and R. P.
    Google ScholarFindings
  • Brett Kessler, Geoffrey Nunberg, and Hinrich Schutze. 1997. Automatic detection of text genre. In 8th Conference of the European Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dimitrios Kotzias, Misha Denil, Nando de Freitas, and Padhraic Smyth. 2015. From group to individual labels using deep features. In KDD.
    Google ScholarFindings
  • Fangtao Li, Chao Han, Minlie Huang, Xiaoyan Zhu, Ying-Ju Xia, Shu Zhang, and Hao Yu. 2010. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 653–661. Coling 2010 Organizing Committee.
    Google ScholarLocate open access versionFindings
  • E. Lin, S. Fang, and J. Wang. 2013. Mining online book reviews for sentimental clustering. In 2013 27th International Conference on Advanced Information Networking and Applications Workshops, pages 179–184.
    Google ScholarLocate open access versionFindings
  • Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
    Google ScholarLocate open access versionFindings
  • Oded Maron and Aparna Lakshmi Ratan. 1998. Multiple-instance learning for natural scene classification. In In The Fifteenth International Conference on Machine Learning, pages 341–349. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Philipp Petrenz. 2012. Cross-Lingual Genre Classification. In Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 11–Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. 2019. SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • X. Wei, J. Wu, and Z. Zhou. 2014. Scalable multiinstance learning. In 2014 IEEE International Conference on Data Mining, pages 1037–1042.
    Google ScholarLocate open access versionFindings
  • Joseph Worsham and Jugal Kalita. 2018. Genre Identification and the Compositional Effect of Genre in Literature. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1963–1973. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yumo Xu and Mirella Lapata. 2019. Weakly supervised domain detection. Transactions of the Association for Computational Linguistics, 7:581–596.
    Google ScholarLocate open access versionFindings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhi hua Zhou, Yu yin Sun, and Yu feng Li. 2009. Multiinstance learning by treating instances as noni.i.d. samples. In In Proceedings of the 26th International Conference on Machine Learning.
    Google ScholarLocate open access versionFindings
  • Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie Review Mining and Summarization. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM ’06, pages 43–50, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科