Comparing Methods for Creating a National Random Sample of Twitter Users
CoRR(2024)
摘要
Twitter data has been widely used by researchers across various social and
computer science disciplines. A common aim when working with Twitter data is
the construction of a random sample of users from a given country. However,
while several methods have been proposed in the literature, their comparative
performance is mostly unexplored. In this paper, we implement four methods to
collect a random sample of Twitter users in the US: 1
Location Query, and Language Query. Then, we compare the methods according to
their tweet- and user-level metrics as well as their accuracy in estimating US
population with and without using inclusion probabilities of various
demographics. Our results show that the 1
than others and best for the construction of a population representative
sample, though its statistical significance is questionable due to large
confidence intervals. We discuss the conditions under which the 1
method may not be suitable and suggest the Bounding Box method as the
second-best method to use.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要