MLSUM: The Multilingual Summarization Corpus

Scialom Thomas,Dray Paul-Alexis,Lamprier Sylvain,Piwowarski Benjamin,Staiano Jacopo

Conference on Empirical Methods in Natural Language Processing（2020）

引用 139|浏览462

暂无评分

摘要

We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages – namely, French, German, Spanish, Russian, Turkish. Together with English news articles from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.

查看译文

关键词

multilingual summarization corpus

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要