Topic classification based on distributed document representation and latent topic information.
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(2017)
摘要
The classical bag-of-words and probabilistic topic models are widely used on topic classification tasks. Recently, neural networks have achieved remarkable performance and formed the mainstream, due to their ability to encode distributed semantic features of documents based on word embeddings. To demonstrate the superiority of neural networks, this paper compares Latent Dirichlet Allocation (LDA) with Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Recurrent Convolutional Neural Network (RCNN), which are the mainstream neural network architectures. Beyond this, we combine the latent topic information inferred by LDA and distributed semantic information learned by neural networks to generate a better document representation for topic classification. The experimental results show that the proposed representation outperforms individual systems and can achieve excellent performance on topic classification tasks.
更多查看译文
关键词
distributed document representation,latent topic information,bag-of-words,probabilistic topic models,neural networks,word embeddings,Recurrent Convolutional Neural Network,Latent Dirichlet Allocation,topic classification,text classification tasks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络