Is Data Collection through Twitter Streaming API Useful for Academic Research?

Alina Campan,Tobel Atnafu,Traian Marius Truta,Joseph Nolan

2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)（2018）

引用 35|浏览1

暂无评分

摘要

In this paper we study the reliability of Twitter data collection when Streaming API with filtering is used. For this purpose, we designed a series of experiments that use the free version of Twitter Streaming API. We ran our experiments between June 24 and July 15, 2018, using soccer terms for filtering, due to the popularity of FIFA World Cup competition that was held at that time. Our experiments showed that when filtering is used for terms that are not very popular, then all the matching Tweets are likely provided by Twitter; in this case, analyzing those Tweets will provide reliable results for research purposes. Also, we concluded that concurrent processes that collect filtering Tweets for very popular terms tend to return almost the same Tweets, even if the obtained Tweet set is only a sample of all possible keyword-matching Tweets. We believe that the sampling in the filtering process is deterministic and using the Tweets collected for very popular filtering terms may lead to biased results due to this non-random sampling process.

查看译文

关键词

Twitter, Streaming API, data collection, sampling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要