Comparing Methods for Creating a National Random Sample of Twitter Users

Meysam Alizadeh, Darya Zare,Zeynab Samei, Mohammadamin Alizadeh,Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi,Fabrizio Gilardi

CoRR（2024）

引用 0|浏览0

暂无评分

摘要

Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four methods to collect a random sample of Twitter users in the US: 1 Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1 than others and best for the construction of a population representative sample, though its statistical significance is questionable due to large confidence intervals. We discuss the conditions under which the 1 method may not be suitable and suggest the Bounding Box method as the second-best method to use.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要