AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We present deep communicating agents in an encoder-decoder architecture to address the challenges of representing a long document for abstractive summarization

Deep Communicating Agents for Abstractive Summarization.

north american chapter of the association for computational linguistics, (2018)

引用281|浏览131
EI
下载 PDF 全文
引用
微博一下

摘要

We present deep communicating agents in an encoder-decoder architecture to address the challenges of representing a long document for abstractive summarization. With deep communicating agents, the task of encoding a long text is divided across multiple collaborating agents, each in charge of a subsection of the input text. These encoders ...更多

代码

数据

0
简介
  • The authors focus on the task of abstractive summarization of a long document.
  • As recurrent neural networks (RNNs) are capable of generating fluent language, variants of encoder-decoder RNNs (Sutskever et al, 2014; Bahdanau et al, 2015) have shown promising results on the abstractive summarization task (Rush et al, 2015; Nallapati et al, 2017).
  • The motivation behind the approach is to be able to dynamically attend to different parts of the input to capture salient facts.
重点内容
  • We focus on the task of abstractive summarization of a long document
  • In contrast to extractive summarization, where a summary is composed of a subset of sentences or words lifted from the input text as is, abstractive summarization requires the generative ability to rephrase and restructure sentences to compose a coherent and concise summary
  • We show our results on the CNN/DailyMail and New York Times datasets in Table 1 and 2 respectively
  • While weaker on ROUGE-L than the Reinforcement Learning model from Paulus et al (2018), the human evaluations in that work showed that their model received lower readability and relevance scores than a model trained with MLE, indicating the additional boost in ROUGE-L was not correlated with summary quality
  • We investigated the problem of encoding long text to generate abstractive summaries and demonstrated that the use of deep communicating agents can improve summarization by both automatic and manual evaluation
方法
  • Datasets The authors conducted experiments on two summarization datasets: CNN/DailyMail (Nallapati et al, 2017; Hermann et al, 2015) and New York Times (NYT) (Sandhaus, 2008).
  • Training Details During training and testing the authors truncate the article to 800 tokens and limit the length of the summary to 100 tokens for training and 110 tokens at test time.
  • The authors distribute the truncated articles among agents for multi-agent models, preserving the paragraph and sentences as possible.
  • For both datasets, the authors limit the input and output vocabulary size to the 50,000 most frequent tokens in the training set.
结果
  • 5.1 Quantitative Analysis

    The authors show the results on the CNN/DailyMail and NYT datasets in Table 1 and 2 respectively.
  • While weaker on ROUGE-L than the RL model from Paulus et al (2018), the human evaluations in that work showed that their model received lower readability and relevance scores than a model trained with MLE, indicating the additional boost in ROUGE-L was not correlated with summary quality
  • This result can account for the best models being more abstractive.
  • Daphne said of the collection , in which she appears with 22-year-old flo dron: ’the
结论
  • The authors investigated the problem of encoding long text to generate abstractive summaries and demonstrated that the use of deep communicating agents can improve summarization by both automatic and manual evaluation.
  • Analysis demonstrates that this improvement is due to the improved ability of covering all and only salient concepts and maintaining semantic coherence in summaries.
表格
  • Table1: Comparison results on the CNN/Daily Mail test set using the F1 variants of Rouge. Best model models are bolded
  • Table2: Comparison results on the New York Times test set using the F1 variants of Rouge. Best model models are bolded
  • Table3: Comparison of multi-agent models varying the number of agents using ROUGE results of model (m7) from Table 1 on CNN/Daily Maily Dataset
  • Table4: Comparison of a human summary to best single- and multi-agent model summaries, (m3) and (m7) from CNN/DailyMail dataset. Although single-agent model generates a coherent summary, it is less focused and contains more unnecessary details ( highlighed red ) and misses keys facts that the multi-agent model successfully captures (bolded)
  • Table5: Head-to-Head and score-based comparison of human evaluations on random subset of CNN/DM dataset. SA=single, MA=multi-agent. ∗ indicates statistical significance at p < 0.001 for focus and p < 0.03 for the overall
  • Table6: Summary statistics of CNN/DailyMail (DM) and New York Times (NYT) Datasets
  • Table7: In this example both single- and multi-agent models demonstrate extractive behaviors. However, each select sentences from different sections of the document. While the single model extracts the second and the third sentences, the multi-agent model successfully selects salient sentences from sentences that are further down in the document, specifically sentence 8 and 10. This can be attributed to the fact that agents can successfully encode salient aspects distributed in distant sections of the document. An interesting result is that even though the multiagent model shows extractive behaviour in this example, it successfully selects the most salient sentences while the single agent model includes superfluous details
  • Table8: The baseline model generates non-coherent summary that references the main character “Michelle Pfeiffer” in an ambiguous way towards the end of the generated summary. In contrast, the multi-agent model successfully captures the main character including the key facts. One interesting feature that the multi-agent model showcases is its simplification property, which accounts for its strength in abstraction. Specifically, it simplified the bold long sentence in the document starting with ”couric will... and only generated the salient words
  • Table9: The single agent model generates summary with superfluous details and the facts are not clearly expressed. Although it was able to capture the statistics of the player correctly (e.g., 15 penalties, 16 attempts), it still missed the player who scored the only goal in the game (i.e., kevin mirallas). On the other hand multi-agent model was able to generate a concise summary with several key facts. However, similar to single agent model, it missed to capture the player who scored the only goal in the game. Interestingly, the document contains the word ”defeated’ but the multi-agent model chose to use beat instead, which does not exist in the original document
Download tables as Excel
相关工作
  • Several recent works investigate attention mechanisms for encoder-decoder models to sharpen the context that the decoder should focus on within the input encoding (Luong et al, 2015; Vinyals et al, 2015b; Bahdanau et al, 2015). For example, Luong et al (2015) proposes global and local attention networks for machine translation, while others investigate hierarchical attention networks for document classification (Yang et al, 2016), sentiment classification (Chen et al, 2016), and dialog response selection (Zhou et al, 2016).

    Attention mechanisms have shown to be crucial for summarization as well (Rush et al, 2015; Zeng et al, 2016; Nallapati et al, 2017), and pointer networks (Vinyals et al, 2015a), in particular, help address redundancy and saliency in generated summaries (Cheng and Lapata, 2016; See et al, 2017; Paulus et al, 2018; Fan et al, 2017). While we share the same motivation as these works, our work uniquely presents an approach based on CommNet, the deep communicating agent framework (Sukhbaatar et al, 2016). Compared to prior multi-agent works on logic puzzles (Foerster et al, 2017), language learning (Lazaridou et al, 2016; Mordatch and Abbeel, 2017) and starcraft games (Vinyals et al, 2017), we present the first study in using this framework for long text generation.

    Finally, our model is related to prior works that address repetitions in generating long text. See et al (2017) introduce a post-trained coverage network to penalize repeated attentions over the same regions in the input, while Paulus et al (2018) use intra-decoder attention to punish generating the same words. In contrast, we propose a new semantic coherence loss and intermediate sentencebased rewards for reinforcement learning to discourage semantically similar generations (§3).
引用论文
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Junyi Jessy Li, Kapil Thandani, and Amanda Stent. 2016. The role of discourse units in near-extractive summarization. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialog.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attentionbased neural machine translation. In EMNLP.
    Google ScholarFindings
  • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 201The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
    Google ScholarLocate open access versionFindings
  • I. Mordatch and P. Abbeel. 2017. Emergence of grounded compositional language in multi-agent populations. In arXiv:1703.04908.
    Findings
  • Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Association for the Advancement of Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Ramakanth Pasunuru and Mohit Bansal. 201Reinforced video captioning with entailment rewards. In EMNLP.
    Google ScholarFindings
  • Romain Paulus, Caiming Xiong, and Richard Socher. 201A deep reinforced model for abstractive summarization. ICLR.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical sequence training for image captioning. arXiv preprint arXiv:1612.00563.
    Findings
  • Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In EMNLP.
    Google ScholarFindings
  • Evan Sandhaus. 2008. New york times annotated corpus. In Linguistic Data Consortium, Philadelphia.
    Google ScholarLocate open access versionFindings
  • Abigale See, Peter J. Liu, and Christopher Manning. 2017. Gettothepoint: Summarization with pointergeneratornetworks. In ACL.
    Google ScholarFindings
  • Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. 2016. Learning multiagent communication with back-propagation. In NIPS.
    Google ScholarFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems.
    Google ScholarFindings
  • Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graphbased attentional neural model. In ACL.
    Google ScholarFindings
  • O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Kuttler, J. Agapiou, and J. et al. Schrittwieser. 20Starcraft ii: A new challenge for reinforcement learning. In arXiv preprint arXiv:1708.04782.
    Findings
  • Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015a. Pointer networks. In NIPS.
    Google ScholarFindings
  • Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015b. Grammar as a foreign language. In NIPS.
    Google ScholarFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In NAACL.
    Google ScholarFindings
  • Wenyuan Zeng, Wenjie Luo, Sanja Fidler, and Raquel Urtasun. 2016. Efcient summarization with readagain and copy mechanism. In arXiv preprint arXiv:1611.03382.
    Findings
  • Xiangyang Zhou, Daxiang Dong, Hua Wu, Shiqi Zhao, D Yu R Yan, Xuan Liu, and H Tian. 2016. Multiview response selection for human-computer conversation. In EMNLP.
    Google ScholarFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn