谷歌浏览器插件
订阅小程序
在清言上使用

An Effective Scheme for Generating An Overview Report over A Very Large Corpus of Documents.

DocEng(2019)

引用 4|浏览16
暂无评分
摘要
How to efficiently generate an accurate, well-structured overview report (ORPT) over thousands of documents is challenging. A well-structured ORPT is divided into sections of multiple levels (e.g., a two-level structure consists of sections and subsections). None of the existing multi-document summarization (MDS) algorithms is suitable for accomplishing this task. To overcome this obstacle, we devise NDORGS (Numerous Documents' Overview Report Generation Scheme) that integrates text filtering, keyword scoring, single-document summarization (SDS), topic modeling, MDS, and title generation to generate a coherent, well-structured ORPT. We then present a multi-criteria evaluation method using techniques of text mining and multi-attribute decision making on a combination of human judgments, running time, information coverage, and topic diversity. We evaluate ORPTs generated by NDORGS on two large corpora of documents, where one is classified and the other unclassified. We show that, using Saaty's pairwise comparison 9-point scale and TOPSIS, the ORPTs generated on SDS's with the length of 20% of the original documents are the best overall on both datasets.
更多
查看译文
关键词
document generation, topic clustering, multi-document summarization, TOPSIS
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要