Short Text Classification Based on Latent Topic Modeling and Word Embedding

DEStech Transactions on Computer Science and Engineering(2017)

引用 0|浏览1
暂无评分
摘要
With the rapid development of the social network and e-commerce, we are exposed to enormous short text every day, ranging from twitters, movie comments, search snippets to news summaries. To classify the short and sparse text accurately is always the basic need for us to deal with information efficiently. However, previous methods fail to achieve high performance due to the sparseness and meaningless of the representation of text. The key breakout lies on the appropriate representation of the words, on which we excogitate a new framework. By discovering the latent topics in the related data crawled from the web, topic distribution can describe the text content in general. Combining with the word embedding generated from the online universal data, the proposed method is a more dense representation, containing semantic information from two different aspects. With this semantic representation of the texts, this framework greatly outperform the previous methods even using the most common SVM classifier, improving the accuracy by 11.58% on standard data set.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要