
Is the Data Suitable? the Comparison of Keyword Versus Location Filters in Crisis Informatics Using Twitter Data

International Journal of Information Management Data Insights(2022)

引用 6|浏览6
Twitter is an increasingly popular platform for understanding and analyzing human sociobehavioral dynamics, information diffusion, and public sentiment largely because of its high volume of content and relative ease of access. This has been particularly true in the field of crisis informatics, in which tweet content and social networks are utilized to study a community's response to a disaster or crisis event. Twitter's Terms of Service, however, restrict the quantity and types of data which can be collected from the platform. Accordingly, there are multiple ways to retrieve data from Twitter with no consensus among researchers as to standard data collection procedures. In this work, we compare two Tweet datasets gathered around Hurricane Harvey—the second-most expensive US hurricane on record—via different methods and show the significant role of the tweet retrieval source on study insights and results. One dataset was collected using keywords to filter relevant data, the other using geographical location. We find that while keyword-based data is more suited to tracking public engagement and identifying information brokers, location-based data is needed to characterize local situational information and communication behaviors. Meanwhile, both datasets have limitations. For example, keywords-based data may underestimate connections among users, while location-based data does not capture sentiment from a broad audience. Hence, in the spirit of advocating for increased transparency and reproducibility of data-driven research, we provide a set of data reporting guidelines based on our findings, to serve as the first step towards trustworthy crisis informatics.
Social media,Crisis informatics,Data collection,Transparency
AI 理解论文
Chat Paper