Comparing Methods for Creating a National Random Sample of Twitter Users

Meysam Alizadeh, Darya Zare,Zeynab Samei, Mohammadamin Alizadeh,Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi,Fabrizio Gilardi

CoRR(2024)

引用 0|浏览0
暂无评分
摘要
Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four methods to collect a random sample of Twitter users in the US: 1 Location Query, and Language Query. Then, we compare the methods according to their tweet- and user-level metrics as well as their accuracy in estimating US population with and without using inclusion probabilities of various demographics. Our results show that the 1 than others and best for the construction of a population representative sample, though its statistical significance is questionable due to large confidence intervals. We discuss the conditions under which the 1 method may not be suitable and suggest the Bounding Box method as the second-best method to use.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要