CC-News-En: A Large English News Corpus
CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020, pp. 3077-3084, 2020.
We describe a static, open-access news corpus using data from the Common Crawl Foundation, who provide free, publicly available web archives, including a continuous crawl of international news articles published in multiple languages. Our derived corpus, CC-News-En, contains 44 million English documents collected between September 2016 an...More
PPT (Upload PPT)