AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We show that the necessary knowledge for key point analysis, once acquired by supervised learning from argumentation data, can be successfully applied cross-domain, making it unnecessary to collect domain specific labeled data for each target domain
Quantitative argument summarization and beyond: Cross domain key point analysis
EMNLP 2020, pp.39-49, (2020)
When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed t...More
PPT (Upload PPT)
- The need for summarizing views, arguments and opinions on a given topic is common to many text analytics applications, across a variety of domains.
- The authors will hereafter refer to such utterances that express an opinion, view, argument, ask, or suggestion, collectively as comments
- Compressing such textual collections into short summaries relies on their inherent redundancy.
- The goal of Multi-Document Summarization (MDS) algorithms is to create short textual summaries from document clusters sharing the same topic.
- These summaries aim to capture most of the relevant information in the input clusters, while removing redundancies.
- The users may want to drill down to view the comments that were mapped to a specific point in the summary
- The need for summarizing views, arguments and opinions on a given topic is common to many text analytics applications, across a variety of domains
- Our method first selects short, high quality comments as key point candidates. It leverages previous work on argument-to-key-point matching to select a subset of the candidates that achieve high coverage of the data. We show that this relatively simple approach for key point extraction achieves results on argumentation data that are on par with human experts
- Key Point Analysis is a novel framework for summarizing arguments, opinions and views
- We present an automatic method for key point extraction, which is shown to perform on par with a human expert
- Our work demonstrates the potential of key point analysis in multiple domains besides argumentation
- We show that the necessary knowledge for key point analysis, once acquired by supervised learning from argumentation data, can be successfully applied cross-domain, making it unnecessary to collect domain specific labeled data for each target domain
- 4.1 Evaluaton Method
Let D be a dataset, T the set of topics3 in D, Ct the set of comments for a topic t ∈ T , and Kt the set of key points extracted for t.
- The authors' goal is to achieve both high precision and high coverage, there is typically a tradeoff between the two
- This tradeoff can be controlled by setting a threshold on the match score, and applying the BM+TH selection policy to match only a subset of the comments to the key points.
- The authors explore this tradeoff by measuring the precision for different levels of coverage.
- The authors measure precision at coverage levels of 0.2, 0.4..., 1.0
- Results and Discussion
The results for the three datasets are summarized in Table 3.
- Table 4 shows an example for key points generated for one of the topic+stance pairs in the Arguments datasets, and their distribution over the comments for that topic and stance.
- The authors compared the automatic key point extraction to the approach taken by (Bar-Haim et al, 2020), where key points were manually created by a debate expert.
- Comparing the results for these key points with the automatic results for the same number of key points shows that the authors were able to achieve similar precision (0.696 vs 0.708) over all the comments.
- The precision for the manual key points is higher
- Key Point Analysis is a novel framework for summarizing arguments, opinions and views.
- It provides both textual and quantitative view of the main points in the summarized data, and allows the user to interactively drill down from points to the actual sentences they cover.
- The authors' work demonstrates the potential of key point analysis in multiple domains besides argumentation.
- The authors show that the necessary knowledge for key point analysis, once acquired by supervised learning from argumentation data, can be successfully applied cross-domain, making it unnecessary to collect domain specific labeled data for each target domain
- Table1: Argument-to-Key Point matching results on the ArgKP dataset
- Table2: Number of topics and comments per dataset beled results over different sets of annotators as follows: 300 random comment-key point pairs were selected from the Arguments dataset11. Each pair was annotated by 14 different annotators. Annotations for each pair were randomly split to two sets, such that each pair in each set had 7 annotations. After processing each set to produce majority labels, Cohen’s Kappa obtained between the pair labels of each set was 0.63
- Table3: Results for the Arguments, Survey, and Reviews datasets
- Table4: Top key points and their coverage for the topic “We should abolish the three-strikes laws” and Pro stance from the Arguments dataset, when generating up to 10 key points using the selection algorithm. After generating the key points list, each of the 267 comments is matched to a key point using the BM selection policy
- Table5: Top key points for the City of Austin Community Survey. Match threshold was set so that the extracted 20 key points cover 60% of the sampled comments. For each key point we show the percentage of matching comments (out of the sampled comments), the precision of matched comments and the top two matching comments. All comments shown in the tables were judged as correct matches, except for the one marked with ’*’
- Table6: Top two key points extracted from gold summaries and from original reviews, on selected topics from the Reviews dataset. For each key point we show the percentage of matching comments, with match threshold 0.999
- Table7: Run time (hours:minutes:seconds)
- The task of Multi-document summarization (MDS) is defined as creating a summary that is both concise and comprehensive from multiple texts with a shared theme, e.g., news articles covering the same event (McKeown et al, 2002). A major chal-
13Using Z test for two population proportions, with p = 0.05 for coverage of 0.6, and p = 0.01 for coverage of 0.8 and 1.0. % P Top Comments
Consider a monorail system to help 9% 0.74 Need much, much better traffic flow, (example, 183 or 620, traffic congestion Palmer).
Traffic flow is terrible!
Austin needs better public transporta- 8% 0.90 For a progressive city, Austin is lacking in public transportation. tion
Make improvements to public transportation in north Austin.
Affordability of housing and living in 5% 0.85 Address rapidly increasing cost of living Austin
- For each fold, we selected the threshold t that maximizes the F1 score over the dev-set
- The model that achieves the best F1 score is ALBERT with an F1 score of 0.809
- Annotators choosing the wrong stance in more than 15% of their annotations, were ignored
- We consider a pair as a match if it was labeled as a match by more than 50% of the annotators
- When matching 60% of the comments, we achieve precision above 90%
- 13Using Z test for two population proportions, with p = 0.05 for coverage of 0.6, and p = 0.01 for coverage of 0.8 and 1.0
Study subjects and analysis
4.2 Datasets. We test our key point analysis method on three datasets: Arguments, Survey and Reviews. Arguments Dataset
We used the 28 topics of the ArgKP dataset as training set (24 topics) and development set (4 topics) for the comment matching classifier, which used the selected model as described in Section 2. This model was applied to all three datasets. The remaining 43 topics in the Arguments dataset were used as the test set
The remaining 43 topics in the Arguments dataset were used as the test set. Following Bar-Haim et al, we perform key point analysis per topic+stance, 86 pairs in total. We trained two versions of the argument quality classifier8
We did not apply this filter to the other datasets, as we found the quality predictions to be less indicative for their comments. Table 2 lists the number of topics and comments in the three datasets, before and after filtering. When selecting key point candidates, we aimed to extract about 20% of the shorter, higher quality comments
4.5 Results and Discussion. The results for the three datasets are summarized in Table 3. Fully automatic key point analysis is shown to perform well on the Arguments test set: precision of 0.752 and 0.792 when matching all the comments to 5 and 10 key points, respectively
- Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, and Benno Stein. 2019. Modeling frames in argumentation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 2922–2932, Hong Kong, China. Association for Computational Linguistics.
- Roy Bar-Haim, Lilach Eden, Roni Friedman, Yoav Kantor, Dan Lahav, and Noam Slonim. 2020. From arguments to key points: Towards automatic argument summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4029–4039, Online. Association for Computational Linguistics.
- Filip Boltuzicand Jan Snajder. 2014. Back up your stance: Recognizing arguments in online discussions. In Proceedings of the First Workshop on Argumentation Mining, pages 49–58, Baltimore, Maryland. Association for Computational Linguistics.
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2013. Towards coherent multidocument summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1163–1173, Atlanta, Georgia. Association for Computational Linguistics.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Charlie Egan, Advaith Siddharthan, and Adam Wyner. 201Summarising the points made in online political debates. In Proceedings of the Third Workshop on Argument Mining (ArgMining2016), pages 134–143, Berlin, Germany. Association for Computational Linguistics.
- Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res., 22:457–479.
- Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 340–348, Beijing, China. Coling 2010 Organizing Committee.
- Shai Gretz, Roni Friedman, Edo Cohen-Karlik, Assaf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. 2020. A large-scale dataset for argument quality ranking: Construction and analysis. In AAAI.
- Kazi Saidul Hasan and Vincent Ng. 2014. Why are you taking this stance? identifying and classifying reasons in ideological debates. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 751– 762, Doha, Qatar. Association for Computational Linguistics.
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 168–177, New York, NY, USA. ACM.
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
- Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Kathleen R. McKeown, Regina Barzilay, David Evans, Vasileios Hatzivassiloglou, Judith L. Klavans, Ani Nenkova, Carl Sable, Barry Schiffman, and Sergey Sigelman. 2002. Tracking and summarizing news on a daily basis with columbia’s newsblaster. In Proceedings of the Second International Conference on Human Language Technology Research, HLT ’02, page 280–285, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
- Amita Misra, Brian Ecker, and Marilyn Walker. 20Measuring the similarity of sentential arguments in dialogue. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 276–287, Los Angeles. Association for Computational Linguistics.
- Nona Naderi. 2016. Argumentation mining in parliamentary discourse. In Proceedings of the 11th International Conference of the Ontario Society for the Study of Argumentation, pages 1–9.
- Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and clustering of arguments with contextualized word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 567– 578, Florence, Italy. Association for Computational Linguistics.
- Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 254–263, Honolulu, Hawaii. Association for Computational Linguistics.
- Benjamin Snyder and Regina Barzilay. 2007. Multiple aspect ranking using the good grief algorithm. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 300–307, Rochester, New York. Association for Computational Linguistics.
- Ivan Titov and Ryan McDonald. 2008. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL-08: HLT, pages 308– 316, Columbus, Ohio. Association for Computational Linguistics.
- Assaf Toledo, Shai Gretz, Edo Cohen-Karlik, Roni Friedman, Elad Venezian, Dan Lahav, Michal Jacovi, Ranit Aharonov, and Noam Slonim. 2019. Automatic argument quality assessment - new datasets and methods. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 5624–5634, Hong Kong, China. Association for Computational Linguistics.
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018, pages 353–355. Association for Computational Linguistics.
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 5754–5764.