CC-News-En: A Large English News Corpus

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020, pp. 3077-3084, 2020.

Cited by: 0|Views12
EI

Abstract:

We describe a static, open-access news corpus using data from the Common Crawl Foundation, who provide free, publicly available web archives, including a continuous crawl of international news articles published in multiple languages. Our derived corpus, CC-News-En, contains 44 million English documents collected between September 2016 an...More

Code:

Data:

Your rating :
0

 

Tags
Comments