Leveraging Graph to Improve Abstractive Multi-Document Summarization

ACL, pp. 6232-6243, 2020.

Cited by: 0|Bibtex|Views90
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
In this paper we explore the importance of graph representations in Multi-document summarization and propose to leverage graphs to improve the performance of neural abstractive Multi-document summarization

Abstract:

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as simi...More
0
Introduction
  • Multi-document summarization (MDS) brings great challenges to the widely used sequence-tosequence (Seq2Seq) neural architecture as it requires effective representation of multiple input documents and content organization of long summaries.
  • Graph representations of documents such as similarity graph based on lexical similarities (Erkan and Radev, 2004) and discourse graph based on discourse relations (Christensen et al, 2013), have been widely used in traditional graph-based extractive MDS models
  • They are not well studied by most abstractive approaches, especially the end-to-end neural approaches.
  • Few work has studied the effectiveness of explicit graph representations on neural abstractive MDS
Highlights
  • Multi-document summarization (MDS) brings great challenges to the widely used sequence-tosequence (Seq2Seq) neural architecture as it requires effective representation of multiple input documents and content organization of long summaries
  • Our experimental results show that our graph model significantly improves the performance of pre-trained language models on Multi-document summarization
  • Our work demonstrates the effectiveness of graph modeling in neural abstractive Multi-document summarization
  • The results show that our model GraphSum consistently outperforms all baselines, which further demonstrate the effectiveness of our model on different types of corpora
  • In this paper we explore the importance of graph representations in Multi-document summarization and propose to leverage graphs to improve the performance of neural abstractive Multi-document summarization
  • We propose an effective method to combine our model with pre-trained language models, which further improves the performance of Multi-document summarization significantly
Methods
  • 4.1 Experimental Setup

    Graph Representations The authors experiment with three well-established graph representations: similarity graph, topic graph and discourse graph.
  • The similarity graph is built based on tf-idf cosine similarities between paragraphs to capture lexical relations.
  • The topic graph is built based on LDA topic model (Blei et al, 2003) to capture topic relations between paragraphs.
  • The discourse graph is built to capture discourse relations based on discourse markers, co-reference and entity links as in Christensen et al (2013).
  • If not explicitly stated, the authors use the similarity graph by default as it has been most widely used in previous work
Results
  • The authors evaluate the models on both the WikiSum and MultiNews datasets to validate the efficiency of them on different types of corpora.
  • The second block shows the results of abstractive methods: (1) FT (Flat Transformer), a transformer-based encoderdecoder model on a flat token sequence; (2) TDMCA, the best performing model of Liu et al (2018); (3) HT (Hierarchical Transformer), a model with hierarchical transformer encoder and flat transformer decoder, proposed by Liu and Lapata (2019a)
  • The authors report their results following Liu and Lapata (2019a).
  • The authors compare the performance of RoBERTa+FT and GraphSum+RoBERTa, which show that the model significantly improves all metrics
Conclusion
  • In this paper the authors explore the importance of graph representations in MDS and propose to leverage graphs to improve the performance of neural abstractive MDS.
  • The authors' proposed model is able to incorporate explicit graph representations into the document encoding process to capture richer relations within long inputs, and utilize explicit graph structure to guide the summary decoding process to generate more informative, fluent and concise summaries.
  • In the future the authors would like to explore other more informative graph representations such as knowledge graphs, and apply them to further improve the summary quality
Summary
  • Introduction:

    Multi-document summarization (MDS) brings great challenges to the widely used sequence-tosequence (Seq2Seq) neural architecture as it requires effective representation of multiple input documents and content organization of long summaries.
  • Graph representations of documents such as similarity graph based on lexical similarities (Erkan and Radev, 2004) and discourse graph based on discourse relations (Christensen et al, 2013), have been widely used in traditional graph-based extractive MDS models
  • They are not well studied by most abstractive approaches, especially the end-to-end neural approaches.
  • Few work has studied the effectiveness of explicit graph representations on neural abstractive MDS
  • Methods:

    4.1 Experimental Setup

    Graph Representations The authors experiment with three well-established graph representations: similarity graph, topic graph and discourse graph.
  • The similarity graph is built based on tf-idf cosine similarities between paragraphs to capture lexical relations.
  • The topic graph is built based on LDA topic model (Blei et al, 2003) to capture topic relations between paragraphs.
  • The discourse graph is built to capture discourse relations based on discourse markers, co-reference and entity links as in Christensen et al (2013).
  • If not explicitly stated, the authors use the similarity graph by default as it has been most widely used in previous work
  • Results:

    The authors evaluate the models on both the WikiSum and MultiNews datasets to validate the efficiency of them on different types of corpora.
  • The second block shows the results of abstractive methods: (1) FT (Flat Transformer), a transformer-based encoderdecoder model on a flat token sequence; (2) TDMCA, the best performing model of Liu et al (2018); (3) HT (Hierarchical Transformer), a model with hierarchical transformer encoder and flat transformer decoder, proposed by Liu and Lapata (2019a)
  • The authors report their results following Liu and Lapata (2019a).
  • The authors compare the performance of RoBERTa+FT and GraphSum+RoBERTa, which show that the model significantly improves all metrics
  • Conclusion:

    In this paper the authors explore the importance of graph representations in MDS and propose to leverage graphs to improve the performance of neural abstractive MDS.
  • The authors' proposed model is able to incorporate explicit graph representations into the document encoding process to capture richer relations within long inputs, and utilize explicit graph structure to guide the summary decoding process to generate more informative, fluent and concise summaries.
  • In the future the authors would like to explore other more informative graph representations such as knowledge graphs, and apply them to further improve the summary quality
Tables
  • Table1: Evaluation results on the WikiSum test set using ROUGE F1. R-1, R-2 and R-L are abbreviations for ROUGE-1, ROUGE-2 and ROUGE-L, respectively
  • Table2: Evaluation results on the MultiNews test set. We report the summary-level ROUGE-L value. The results of different graph types are also compared
  • Table3: Comparison of different input length on the WikiSum test set using ROUGE F1. ∇ indicates the improvements of GraphSum over HT
  • Table4: Ablation study on the WikiSum test set
  • Table5: Ranking results of system summaries by human evaluation. 1 is the best and 5 is the worst. The larger rating denotes better summary quality. R.B. and G.S. are the abbreviations of RoBERTa and GraphSum, respectively. ∗ indicates the overall ratings of the corresponding model are significantly (by Welchs t-test with p < 0.01) outperformed by our models GraphSum and GraphSum+RoBERTa
  • Table6: Evaluation results on the WikiSum test set with sentence-level ROUGE-L value
  • Table7: Evaluation results on the MultiNews test set with sentence-level ROUGE-L value
Download tables as Excel
Related work
  • 2.1 Graph-based MDS

    Most previous MDS approaches are extractive, which extract salient textual units from documents based on graph-based representations of sentences. Various ranking methods have been developed to rank textual units based on graphs to select most salient ones for inclusion in the final summary.

    Erkan and Radev (2004) propose LexRank to compute sentence importance based on a lexical similarity graph of sentences. Mihalcea and Tarau (2004) propose a graph-based ranking model to extract salient sentences from documents. Wan (2008) further proposes to incorporate documentlevel information and sentence-to-document relations into the graph-based ranking process. A series of variants of the PageRank algorithm has been
Funding
  • This work was supported by the National Key Research and Development Project of China (No 2018AAA0101900)
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Siddhartha Banerjee, Prasenjit Mitra, and Kazunari Sugiyama. 2015. Multi-document abstractive summarization using ilp based multi-sentence compression. In Proceedings of 24th International Joint Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Regina Barzilay. 200Information Fusion for Multidocument Summerization: Paraphrasing and Generation. Ph.D. thesis, Columbia University.
    Google ScholarFindings
  • Regina Barzilay and Kathleen R McKeown. 2005. Sentence fusion for multidocument news summarization. Computational Linguistics, 31(3):297–328.
    Google ScholarLocate open access versionFindings
  • Tal Baumel, Matan Eyal, and Michael Elhadad. 2018. Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704.
    Findings
  • Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 481–490. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, and Rebecca J Passonneau. 2015. Abstractive multidocument summarization via phrase selection and merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, volume 1, pages 1587– 1597.
    Google ScholarLocate open access versionFindings
  • David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
    Google ScholarLocate open access versionFindings
  • Xiaoyan Cai and Wenjie Li. 2012. Mutually reinforced manifold-ranking based relevance propagation model for query-focused multi-document summarization. IEEE Transactions on Audio, Speech, and Language Processing, 20(5):1597–1607.
    Google ScholarLocate open access versionFindings
  • Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, and Yejin Choi. 2018. Deep communicating agents for abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 1662– 1675.
    Google ScholarLocate open access versionFindings
  • Janara Christensen, Stephen Soderland, Oren Etzioni, et al. 2013. Towards coherent multi-document summarization. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, pages 1163–1173.
    Google ScholarLocate open access versionFindings
  • Eric Chu and Peter Liu. 2019. Meansum: a neural model for unsupervised multi-document abstractive summarization. In International Conference on Machine Learning, pages 1223–1232.
    Google ScholarLocate open access versionFindings
  • Trevor Anthony Cohn and Mirella Lapata. 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research, 34:637–674.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pages 13042–13054.
    Google ScholarLocate open access versionFindings
  • Sergey Edunov, Alexei Baevski, and Michael Auli. 2019. Pre-trained language model representations for language generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 4052– 4059.
    Google ScholarLocate open access versionFindings
  • Gunes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479.
    Google ScholarLocate open access versionFindings
  • Alexander R Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R Radev. 2019. Multi-news: a largescale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084.
    Google ScholarLocate open access versionFindings
  • Angela Fan, Claire Gardent, Chloe Braud, and Antoine Bordes. 20Using local knowledge graph construction to scale seq2seq models to multi-document inputs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4184–4194.
    Google ScholarLocate open access versionFindings
  • Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2018. Structured neural summarization. In Proceedings of the 7th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Katja Filippova and Michael Strube. 2008. Sentence fusion via dependency graph compression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 177–185. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastian Gehrmann, Yuntian Deng, and Alexander M Rush. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4098–4109.
    Google ScholarLocate open access versionFindings
  • Pierre-Etienne Genest and Guy Lapalme. 2011. Framework for abstractive summarization using text-totext generation. In Proceedings of the Workshop on Monolingual Text-To-Text Generation, pages 64–73. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Thomas N Kipf and Max Welling. 2016. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
    Findings
  • Logan Lebanoff, Kaiqiang Song, and Fei Liu. 2018. Adapting the neural encoder-decoder framework from single to multi-document summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4131–4141.
    Google ScholarLocate open access versionFindings
  • Wei Li. 2015. Abstractive multi-document summarization with semantic information extraction. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1908– 1913.
    Google ScholarLocate open access versionFindings
  • Wei Li, Xinyan Xiao, Yajuan Lyu, and Yuanzhuo Wang. 2018a. Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1787–1796.
    Google ScholarLocate open access versionFindings
  • Wei Li, Xinyan Xiao, Yajuan Lyu, and Yuanzhuo Wang. 2018b. Improving neural abstractive document summarization with structural regularization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4078– 4087.
    Google ScholarLocate open access versionFindings
  • Wei Li and Hai Zhuge. 2019. Abstractive multidocument summarization based on semantic link network. IEEE Transactions on Knowledge and Data Engineering.
    Google ScholarLocate open access versionFindings
  • Kexin Liao, Logan Lebanoff, and Fei Liu. 2018. Abstract meaning representation for multi-document summarization. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1178–1190.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 605–612. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. In Proceedings of the 6th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Mirella Lapata. 2019a. Hierarchical transformers for multi-document summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5070– 5081.
    Google ScholarLocate open access versionFindings
  • Yang Liu and Mirella Lapata. 2019b. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3728–3738.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Shulei Ma, Zhi-Hong Deng, and Yunlun Yang. 2016. An unsupervised multi-document summarization framework based on neural document model. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pages 1514–1523.
    Google ScholarLocate open access versionFindings
  • Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411.
    Google ScholarLocate open access versionFindings
  • Shashi Narayan, Shay B Cohen, and Mirella Lapata. 2018. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807.
    Google ScholarLocate open access versionFindings
  • Vlad Niculae, Andre FT Martins, and Claire Cardie. 2018. Towards dynamic computation graphs via sparse latent structure. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 905–911.
    Google ScholarLocate open access versionFindings
  • Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In Proceedings of the 6th International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Laura Perez-Beltrachini, Yang Liu, and Mirella Lapata. 2019. Generating summaries with topic templates and structured convolutional decoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5107–5116.
    Google ScholarLocate open access versionFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 2227–2237.
    Google ScholarLocate open access versionFindings
  • Daniele Pighin, Marco Cornolti, Enrique Alfonseca, and Katja Filippova. 2014. Modelling events through memory-based, open-ie patterns for abstractive summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 892–901.
    Google ScholarLocate open access versionFindings
  • Dragomir Radev. 2000. A common theory of information fusion from multiple text sources step one: cross-document structure. In 1st SIGdial workshop on Discourse and dialogue.
    Google ScholarLocate open access versionFindings
  • Sascha Rothe, Shashi Narayan, and Aliaksei Severyn. 2019. Leveraging pre-trained checkpoints for sequence generation tasks. arXiv preprint arXiv:1907.12461.
    Findings
  • Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointergenerator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1073–1083.
    Google ScholarLocate open access versionFindings
  • Eva Sharma, Luyang Huang, Zhe Hu, and Lu Wang. 2019. An entity-driven framework for abstractive summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3278– 3289.
    Google ScholarLocate open access versionFindings
  • Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2019. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Xiaojun Wan. 2008. An exploration of document impact on graph-based multi-document summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 755–762. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiaojun Wan and Jianguo Xiao. 2009. Graph-based multi-modality learning for topic-focused multidocument summarization. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • Lu Wang and Claire Cardie. 2013. Domainindependent abstract generation for focused meeting summarization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1395–1405.
    Google ScholarLocate open access versionFindings
  • Wenmian Yang, Weijia Jia, Wenyuan Gao, Xiaojie Zhou, and Yutao Luo. 2019a. Interactive variance attention based online spoiler detection for time-sync comments. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1241–1250. ACM.
    Google ScholarLocate open access versionFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019b. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems (NIPS 2019), pages 5754–5764.
    Google ScholarLocate open access versionFindings
  • Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, and Dragomir Radev. 2017. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning, pages 452–462.
    Google ScholarLocate open access versionFindings
  • Yongjing Yin, Linfeng Song, Jinsong Su, Jiali Zeng, Chulun Zhou, and Jiebo Luo. 2019. Graph-based neural sentence ordering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 5387–5393.
    Google ScholarLocate open access versionFindings
  • Jianmin Zhang, Jiwei Tan, and Xiaojun Wan. 2018. Towards a neural network approach to abstractive multi-document summarization. arXiv preprint arXiv:1804.09010.
    Findings
Full Text
Your rating :
0

 

Tags
Comments