Integrating Document Clustering and Multidocument Summarization

TKDD(2011)

引用 90|浏览53
暂无评分
摘要
Document understanding techniques such as document clustering and multidocument summarization have been receiving much attention recently. Current document clustering methods usually represent the given collection of documents as a document-term matrix and then conduct the clustering process. Although many of these clustering methods can group the documents effectively, it is still hard for people to capture the meaning of the documents since there is no satisfactory interpretation for each document cluster. A straightforward solution is to first cluster the documents and then summarize each document cluster using summarization methods. However, most of the current summarization methods are solely based on the sentence-term matrix and ignore the context dependence of the sentences. As a result, the generated summaries lack guidance from the document clusters. In this article, we propose a new language model to simultaneously cluster and summarize documents by making use of both the document-term and sentence-term matrices. By utilizing the mutual influence of document clustering and summarization, our method makes; (1) a better document clustering method with more meaningful interpretation; and (2) an effective document summarization method with guidance from document clustering. Experimental results on various document datasets show the effectiveness of our proposed method and the high interpretability of the generated summaries.
更多
查看译文
关键词
clustering process,document clustering,integrating document clustering,various document datasets,document understanding technique,document cluster,current document,effective document summarization method,clustering method,better document,sentence-term matrix,multidocument summarization,language model,context dependent,nonnegative matrix factorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要