AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduce Multi-XScience, a large-scale dataset for multidocument summarization using scientific articles

Multi XScience: A Large scale Dataset for Extreme Multi document Summarization of Scientific Articles

EMNLP 2020, pp.8068-8074, (2020)

Cited by: 0|Views234
Full Text
Bibtex
Weibo

Abstract

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based ...More

Code:

Data:

Introduction
  • Single document summarization is the focus of most current summarization research thanks to the availability of large-scale single-document summarization datasets spanning multiple fields, including news (CNN/DailyMail (Hermann et al, 2015), NYT (Sandhaus, 2008), Newsroom (Grusky et al, 2018), XSum (Narayan et al, 2018a)), law (BigPatent (Sharma et al, 2019)), and even science (ArXiv and PubMed (Cohan et al, 2018))
  • These large-scale datasets are a necessity for modern data-hungry neural architectures (e.g. Transformers (Vaswani et al, 2017)) to shine at the summarization task.
  • Source 2 This paper presents a method for the resolution of lexical ambiguity of nouns.
  • Source 4 In ... word sense disambiguation... integrates a diverse set of knowledge sources ... including part of speech of neighboring words, morphological form
Highlights
  • Single document summarization is the focus of most current summarization research thanks to the availability of large-scale single-document summarization datasets spanning multiple fields, including news (CNN/DailyMail (Hermann et al, 2015), NYT (Sandhaus, 2008), Newsroom (Grusky et al, 2018), XSum (Narayan et al, 2018a)), law (BigPatent (Sharma et al, 2019)), and even science (ArXiv and PubMed (Cohan et al, 2018))
  • We introduce a challenging multi-document summarization task: write the related work section of a paper using its abstract and reference papers
  • We introduce Multi-XScience, a large-scale dataset for multidocument summarization (MDS) using scientific articles
  • Multi-XScience is better suited to abstractive summarization than previous MDS datasets, since it requires summarization models to exhibit high text understanding and abstraction capabilities
  • Experimental results show that our dataset is amenable to abstractive summarization models and is challenging for current models
Results
  • The authors study the performance of multiple state-of-theart models using the Multi-XScience dataset.
  • HierSumm (Liu and Lapata, 2019a)
  • Both deal with multi-documents using a fusion mechanism, which performs the transformation of the documents in the vector space.
  • HiMAP adapts a pointer-generator model (See et al, 2017) with maximal marginal relevance (MMR) (Carbonell and Goldstein, 1998; Lebanoff et al, 2018) to compute weights over multi-document inputs.
  • HierSumm (Liu and Lapata, 2019a) uses a passage ranker that selects the most important document as the input to the hierarchical transformer-based generation model
Conclusion
  • The lack of large-scale dataset has slowed the progress of multi-document summarization (MDS) research.
  • The authors introduce Multi-XScience, a large-scale dataset for MDS using scientific articles.
  • Multi-XScience is better suited to abstractive summarization than previous MDS datasets, since it requires summarization models to exhibit high text understanding and abstraction capabilities.
  • Experimental results show that the dataset is amenable to abstractive summarization models and is challenging for current models
Tables
  • Table1: An example from our Multi-XScience dataset showing the input documents and the related work of the target paper. Text is colored based on semantic similarity between sources and related work
  • Table2: Comparison of large-scale multi-document summarization datasets. We propose Multi-XScience. Average document length (“doc. len”) is calculated by concatenating all input sources (multiple reference documents)
  • Table3: The proportion of novel n-grams in the target reference summaries across different summarization datasets. The first and second block compare single-document and multidocument summarization datasets, respectively
  • Table4: ROUGE scores for the LEAD and EXT-ORACLE baselines for different summarization datasets
  • Table5: Dataset quality evaluation criteria
  • Table6: ROUGE results on Multi-XScience test set
  • Table7: The proportion of novel n-grams in generated summary. PG (CNNDM) and PG (XSUM) denotes the pointer-generator model performance reported by papers (<a class="ref-link" id="cSee_et+al_2017_a" href="#rSee_et+al_2017_a">See et al, 2017</a>; Narayan et al, 2018b) trained on different datasets. All the remaining results are trained on Multi-XScience dataset
  • Table8: Generation example of extractive oracle (EXTORACLE), HiMAP and Pointer-Generator (PG)
Download tables as Excel
Related work
  • Scientific document summarization is a challenging task. Multiple models trained on small datasets exist for this task (Hu and Wan, 2014; Jaidka et al, 2013; Hoang and Kan, 2010), as there are no available large-scale datasets (before this paper). Attempts at creating scientific summarization datasets have been emerging, but not to the scale required for training neural-based models. For example, CL-Scisumm (Jaidka et al, 2016) created datasets from the ACL Anthology with 30–50 articles; Yasunaga et al and AbuRa’ed et al.10 proposed human-annotated datasets with at most 1,000 article and summary pairs. We believe that the lack of largescale datasets slowed down development of multi-

    10This is concurrent work.
Funding
  • This work is supported by the Canadian Institute For Advanced Research (CIFAR) through its AI chair program and an IVADO fundamental research grant
Study subjects and analysis
pairs: 25
2.4 Human Evaluation on Dataset Quality. Two human judges evaluated the overlap between the sources and the target on 25 pairs randomly selected from the test set.7. They scored each pair using the scale shown in Table 5

samples: 25
Human Evaluation We conduct human evaluation on ext-oracle, HiMAP, and Pointer-Generator, since each outperforms others in their respective section of Table 6. For evaluation, we randomly select 25 samples and present the system outputs. 9The scores are computed with ROUGE-1.5.5 script with option “-c 95 -r 1000 -n 2 -a -m”

article and summary pairs: 1000
Attempts at creating scientific summarization datasets have been emerging, but not to the scale required for training neural-based models. For example, CL-Scisumm (Jaidka et al, 2016) created datasets from the ACL Anthology with 30–50 articles; Yasunaga et al and AbuRa’ed et al.10 proposed human-annotated datasets with at most 1,000 article and summary pairs. We believe that the lack of largescale datasets slowed down development of multi-

Reference
  • Iz Beltagy, Arman Cohan, and Kyle Lo. 2019. Scibert: Pretrained contextualized embeddings for scientific text. arXiv preprint arXiv:1903.10676.
    Findings
  • Alexander Richard Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi-news: A largescale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084.
    Google ScholarLocate open access versionFindings
  • Matt Grenander, Yue Dong, Jackie Chi Kit Cheung, and Annie Louis. 2019. Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6021–6026.
    Google ScholarLocate open access versionFindings
  • Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 708–719.
    Google ScholarLocate open access versionFindings
  • Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 201Teaching machines to read and comprehend. In Advances in neural information processing systems, pages 1693–1701.
    Google ScholarLocate open access versionFindings
  • Cong Duy Vu Hoang and Min-Yen Kan. 2010. Towards automated related work summarization. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 427–435.
    Google ScholarLocate open access versionFindings
  • Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336.
    Google ScholarLocate open access versionFindings
  • Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 201A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 615–621.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Yue Hu and Xiaojun Wan. 2014. Automatic generation of related work sections in scientific papers: an optimization approach. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1624–1633.
    Google ScholarLocate open access versionFindings
  • Kokil Jaidka, Muthu Kumar Chandrasekaran, Sajal Rustagi, and Min-Yen Kan. 2016. Overview of the cl-scisumm 2016 shared task. In Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pages 93–102.
    Google ScholarLocate open access versionFindings
  • Kokil Jaidka, Christopher Khoo, and Jin-Cheon Na. 2013. Deconstructing human literature reviews–a framework for multi-document summarization. In proceedings of the 14th European workshop on natural language generation, pages 125–135.
    Google ScholarLocate open access versionFindings
  • Logan Lebanoff, Kaiqiang Song, and Fei Liu. 2018. Adapting the neural encoder-decoder framework from single to multi-document summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4131–4141.
    Google ScholarLocate open access versionFindings
  • Gunes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479.
    Google ScholarLocate open access versionFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. In International Conference on Learning Representations.
    Google ScholarFindings
  • Yang Liu and Mirella Lapata. 2019a. Hierarchical transformers for multi-document summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5070– 5081.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Mirella Lapata. 2019b. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3721–3731.
    Google ScholarLocate open access versionFindings
  • Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411.
    Google ScholarLocate open access versionFindings
  • Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018a. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807.
    Google ScholarLocate open access versionFindings
  • Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018b. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807.
    Google ScholarLocate open access versionFindings
  • Evan Sandhaus. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
    Google ScholarLocate open access versionFindings
  • Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointergenerator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1073–1083.
    Google ScholarLocate open access versionFindings
  • Eva Sharma, Chen Li, and Lu Wang. 2019. Bigpatent: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2204–2213.
    Google ScholarLocate open access versionFindings
  • Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas)
    Google ScholarFindings
  • and applications. In Proceedings of the 24th international conference on world wide web, pages 243– 246.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R Fabbri, Irene Li, Dan Friedman, and Dragomir R Radev. 2019. Scisummnet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7386–7393.
    Google ScholarLocate open access versionFindings
Author
Yao Lu
Yao Lu
Yue Dong
Yue Dong
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科