Hashtag-Based Tweet Expansion for Improved Topic Modeling

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS(2023)

引用 0|浏览0
暂无评分
摘要
Topic modeling on tweets is known to experience under-specificity and data sparsity due to its character limitation. In earlier studies, researchers attempted to address this problem by either 1) tweet aggregation, where related tweets are combined into a single document or 2) tweet expansion with related text from external sources. The first approach faces the problem of losing the topic distribution in individual tweets. While finding a relevant text from the external source for a random tweet in the second approach is challenging for various reasons like differences in writing styles, multilingual content, and informal text. In contrast to adding context from external resources or combining related tweets into a pool, this study uses the internal vocabulary (hashtags) to counter under-specificity and sparsity in tweets. Earlier studies have indicated hashtags to be an important feature for representing the underlying context present in the tweet. Sequential models like Bi-directional Long Short Term Memory (BiLSTM) and Convolution Neural Network (CNN) over distributed representation of words have shown promising results in capturing semantic relationships between words of a tweet in the past. Motivated by the above, this article proposes a unified framework of hashtag-based tweet expansion exploiting text-based and network-based representation learning methods such as BiLSTM, BERT, and Graph Convolution Network (GCN). The hashtag-based expanded tweets using the proposed framework have significantly improved topic modeling performance compared to un-expanded (raw) tweets and hashtag-pooling-based approaches over two real-world tweet datasets of different nature. Furthermore, this article also studies the significance of hashtags in topic modeling performance by experimenting with different combination of word types such as hashtags, keywords, and user mentions.
更多
查看译文
关键词
Feature extraction, Internet, Encyclopedias, Semantics, Online services, Task analysis, Social networking (online), BERT, Bi-directional Long Short Term Memory (BiLSTM), Co-occurrence Frequency Expansion (CoFE), Graph Convolution Network (GCN), hashtags, Latent Dirichlet Allocation (LDA), topic modeling, tweet expansion, twitter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要