AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To tackle the challenge of lacking supervised data, we have developed a new knowledge-informed weakly supervised method that leverages external knowledge bases

Summarizing Text on Any Aspects: A Knowledge Informed Weakly Supervised Approach

EMNLP 2020, pp.6301-6309, (2020)

Cited by: 1|Views418
Full Text
Bibtex
Weibo

Abstract

Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect. Previous studies usually assume a small pre-defined set of aspects and fall short of summarizing on other diverse topics. In this work, we study summarizing on arbitrary aspect...More

Code:

Data:

0
Introduction
  • Remarkable progresses have been made in generating generic summaries of documents (Nallapati et al, 2016; See et al, 2017; Narayan et al, 2018), partially due to the large amount of supervision data available.
  • A document, such as a news article or a medical report, can span multiple topics or aspects.
  • A key challenge of the task is the lack of direct supervision data containing documents paired with multiple aspect-based summaries.
Highlights
  • Remarkable progresses have been made in generating generic summaries of documents (Nallapati et al, 2016; See et al, 2017; Narayan et al, 2018), partially due to the large amount of supervision data available
  • This paper aims to go beyond pre-defined aspects and enable summarization on arbitrary aspects relevant to the document
  • We develop a new approach that integrates rich external knowledge in both aspect modeling and weak supervision construction
  • Experiments on real news articles show our approach achieves performance boosts over existing methods
  • We aim to enable summarization on any aspects, and develop new weak supervisions by integrating rich external knowledge
  • To tackle the challenge of lacking supervised data, we have developed a new knowledge-informed weakly supervised method that leverages external knowledge bases
Methods
  • Setup The authors construct weak supervisions from 100K out of 280K pairs in the training set of the CNN/DailyMail dataset (Hermann et al, 2015).
  • Its aspects are restricted to only 6 coarsegrained topics, the synthetic domain facilitates automatic evaluation, providing a testbed for (1) comparison with the previous models and (2) studying the generalization ability of the weak-supervision approach when adapting to the new domain.
  • The dataset has 280K/10K/10K examples in train/dev/test sets, respectively, and contains 6 pre-defined aspects including {“sport", “health", “travel", “news", “science technology", “tv showbiz"}
Results
  • Experiments show the approach achieves performance boosts on summarizing both real and synthetic documents given pre-defined or arbitrary aspects.
  • Experiments on real news articles show the approach achieves performance boosts over existing methods.
  • By further using only 1K MA-News instances to continue fine-tuning the model, the authors achieve performance boosts compared to both SF and BART MA-News-Sup 1K
Conclusion
  • This paper studies the new problem of summarizing a document on arbitrary relevant aspects.
  • To tackle the challenge of lacking supervised data, the authors have developed a new knowledge-informed weakly supervised method that leverages external knowledge bases.
  • The promising empirical results motivate them to explore further the integration of more external knowledge and other rich forms of supervisions (Hu and Xing, 2020; Ziegler et al, 2019) in learning.
  • The authors are interested in extending the aspect-based summarization in more application scenarios
Tables
  • Table1: Results (ROUGE) on the MA-News test set. The results of Lead-3, PG-Net and SF are from (<a class="ref-link" id="cFrermann_2019_a" href="#rFrermann_2019_a">Frermann and Klementiev, 2019</a>), where SF is the previous best model. Our approach trains with only weak supervisions (sec 3.1) or with additional 1K MA-News supervised training data
  • Table2: Fine-tuning BART on the synthetic domain, evaluated on MA-News test set. Weak-Sup only trains BART only with our weak supervisions. MA-News-Sup 1K trains with 1K MA-News supervised examples. +Weak-Sup trains first with weak supervisions and then supervisedly on MA-News
  • Table3: Human evaluation using 5-point Likert scale. MA-News 280K trains BART with the whole MANews set. Weak-Sup trains with our weak supervisions. +MA-News 3K further fine-tunes with 3K MANews instances
  • Table4: Generated summaries of a document on different aspects. Document content relevant to specific aspects is highlighted in respective colors. “Related words” identified through Wikipedia (sec 3.2) are highlighted in bold
Download tables as Excel
Related work
  • Aspect-based summarization as an instance of controllable text generation (Hu et al, 2017; Ficler and Goldberg, 2017) offers extra controllability com- Document NER

    Generic summary Colony collapse disorder has killed millions of bees. Scientists suspect a virus may combine with other factors to collapse colonies. Disorder first cropped up in 2004, as bees were imported from Australia. $15 billion in U.S crops each year dependent on bees for pollination.

    Extracted aspects {bees, Australia, U.S.}

    {insect, fly, colonoy, flower, country, Great Barrier Reef, Oceania, koala, ...} ConceptNet 1 2 Aspect: U.S.
Funding
  • Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents given pre-defined or arbitrary aspects
  • Experiments on real news articles show our approach achieves performance boosts over existing methods
  • By further using only 1K MA-News instances to continue fine-tuning the model, we achieve performance boosts compared to both SF and BART MA-News-Sup 1K
Study subjects and analysis
articles: 50
Aspect: vote Summary: Polls show that at least 83 percent of the U.S electorate is opposed to expanding immigration and that 85 percent of black voters oppose the plan to admit more than 100,000 middle eastern refugees to the country. the All The News corpus (Kaggle, 2020) where we randomly extract 50 articles from different publications other than CNN (so that no articles are included in the weak supervision). We ask human annotators to label an arbitrary relevant aspect for each article

Reference
  • Stefanos Angelidis and Mirella Lapata. 2018. Summarizing opinions: Aspect extraction meets sentiment prediction and they are both weakly supervised. In EMNLP.
    Google ScholarFindings
  • John M Conroy, Judith D Schlesinger, and Dianne P O’Leary. 2006. Topic-focused multi-document summarization using an approximate oracle score. In COLING/ACL, pages 152–159.
    Google ScholarLocate open access versionFindings
  • Hoa Trang Dang. 2005. Overview of duc 2005. In DUC, volume 2005, pages 1–12.
    Google ScholarLocate open access versionFindings
  • Hal Daumé III and Daniel Marcu. 2006. Bayesian query-focused summarization. In COLING/ACL, pages 305–312. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Daniel Deutsch and Dan Roth. 2019. Summary cloze: A new task for content selection in topic-focused summarization. In EMNLP, pages 3711–3720.
    Google ScholarLocate open access versionFindings
  • Jessica Ficler and Yoav Goldberg. 2017. Controlling linguistic style aspects in neural language generation. arXiv preprint arXiv:1707.02633.
    Findings
  • Lea Frermann and Alexandre Klementiev. 2019. Inducing document structure for aspect-based summarization. In ACL, pages 6263–6273, Florence, Italy. ACL.
    Google ScholarLocate open access versionFindings
  • Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In NeurIPS, pages 1693–1701.
    Google ScholarLocate open access versionFindings
  • Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In KDD, pages 168–177.
    Google ScholarLocate open access versionFindings
  • Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. 2016. Harnessing deep neural networks with logic rules. In ACL.
    Google ScholarFindings
  • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, et al. 2019a. Texar: A modularized, versatile, and extensible toolkit for text generation. In ACL 2019, System Demonstrations.
    Google ScholarFindings
  • Zhiting Hu, Bowen Tan, Russ R Salakhutdinov, Tom M Mitchell, and Eric P Xing. 2019b. Learning data manipulation for augmentation and weighting. In NeurIPS.
    Google ScholarFindings
  • Zhiting Hu and Eric P Xing. 2020. Learning from all types of experiences: A unifying machine learning perspective. In KDD.
    Google ScholarFindings
  • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In ICML.
    Google ScholarFindings
  • Kaggle. 2020. All the news 2.0: 2.7 million news articles. https://components.one/datasets/all-the-news-2-news-articles-dataset/.
    Findings
  • Kundan Krishna and Balaji Vasan Srinivasan. 2018. Generating topic-oriented summaries using neural attention. In NAACL, pages 1697–1705.
    Google ScholarLocate open access versionFindings
  • Wojciech Kryscinski, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Neural text summarization: A critical evaluation. In EMNLP.
    Google ScholarFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Chin-Yew Lin and Eduard Hovy. 2000. The automated acquisition of topic signatures for text summarization. In COLING, pages 495–501.
    Google ScholarLocate open access versionFindings
  • Yan Liu, Sheng-hua Zhong, and Wenjie Li. 2012. Query-oriented multi-document summarization via unsupervised deep learning. In AAAI.
    Google ScholarFindings
  • Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL.
    Google ScholarFindings
  • Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In EMNLP.
    Google ScholarFindings
  • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In NAACL.
    Google ScholarFindings
  • Haoruo Peng, Yangqiu Song, and Dan Roth. 2016. Event detection and co-reference with minimal supervision. In EMNLP, pages 392–402.
    Google ScholarLocate open access versionFindings
  • Ana-Maria Popescu and Orena Etzioni. 2007. Extracting product features and opinions from reviews. In Natural language processing and text mining, pages 9–28. Springer.
    Google ScholarLocate open access versionFindings
  • Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. In VLDB.
    Google ScholarFindings
  • Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointergenerator networks.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In ACL, pages 86–96.
    Google ScholarLocate open access versionFindings
  • Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI.
    Google ScholarFindings
  • Lu Wang and Wang Ling. 2016. Neural network-based abstract generation for opinions and arguments. In NAACL.
    Google ScholarFindings
  • Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In EMNLP, pages 6383– 6389.
    Google ScholarLocate open access versionFindings
  • 2020. Conditional self-attention for query-based summarization. arXiv preprint arXiv:2002.07338.
    Findings
  • Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科