W2e: Aworldwide-Event Benchmark Dataset For Topic Detection And Tracking

CIKM(2018)

引用 15|浏览73
暂无评分
摘要
Topic detection and tracking in document streams is a critical task in many important applications, hence has been attracting research interest in recent decades. With the large size of data streams, there have been a number of works from different approaches that propose automatic methods for the task. However, there is only a few small benchmark datasets that are publicly available for evaluating the proposed methods. The lack of large datasets with finegrained groundtruth implicitly restrains the development of more advanced methods. In this work, we address this issue by collecting and publishing W2E- a large dataset consisting of news articles from more than 50 prominent mass media channels worldwide. The articles cover a large set of popular events within a full year. W2E is more than 15 times larger than TREC's TDT2 dataset, which is widely used in prior work. We further conduct exploratory analysis to examine the dynamics and diversity of W2E and propose potential uses of the dataset in other research.
更多
查看译文
关键词
Topic detection, topic tracking, benchmark dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要