TwitchChat: A Dataset for Exploring Livestream Chat.

AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE)(2020)

引用 2|浏览12
暂无评分
摘要
Most natural language processing research focuses on modelling and understanding text formed of complete sentences with correct spelling and grammar. However, livestream chat is drastically different. Viewers are typically writing short messages while responding to in-stream events, often with incorrect grammar and many repeated tokens. Additionally, tokens that are commonly used in livestream chat are unknown to traditional language understanding efforts that focus on prosaic text. To advance and encourage further research in terms of livestream chat understanding, in this work, we present a large-scale dataset of video game livestream chat, consisting of over 60 million tokens. As livestreaming becomes more popular it is also increasingly pertinent to study, though chat analysis, the way in which the audience is engaging with the stream. However, this is not a straightforward task, livestream chat is a rich and complex domain, far removed from often studied prosaic text. Additionally. we provide a case study analysis of word vector methods applied to the dataset, showing that the vector space is strangely shaped but clusterable and that the resulting clusters correlate with features such as streamer popularity. Furthermore, human relatedness tests highlight the difference that this domain poses with respect to prosaic text. It is hoped the livestream chat dataset, the discussion of its unique features, and the challenges highlighted for future work will invigorate the research community into further study of livestream chat.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要