谷歌浏览器插件
订阅小程序
在清言上使用

HunSum-1: an Abstractive Summarization Dataset for Hungarian

CoRR(2023)

引用 0|浏览10
暂无评分
摘要
We introduce HunSum-1: a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models' results. The HunSum-1 dataset, all models used in our experiments and our code are available open source.
更多
查看译文
关键词
abstractive summarization dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要