Topic Structure-Aware Neural Language Model: Unified language model that maintains word and topic ordering by their embedded representations

WWW '19: The Web Conference San Francisco CA USA May, 2019(2019)

引用 6|浏览16
暂无评分
摘要
Our goal is to exploit a unified language model so as to explain the generative process of documents precisely in view of their semantic and topic structures. Because various methods model documents in disparate ways, we are motivated by the expectation that coordinating these methods will allow us to achieve this goal more efficiently than using them in isolation; we combine topic models, embedding models, and neural language models. As we focus on the fact that topic models can be shared among, and indeed complement embedding models and neural language models, we propose Word and topic 2 vec (Wat2vec), and Topic Structure-Aware Neural Language Model (TSANL). Wat2vec uses topics as global semantic information and local semantic information as embedding representations of topics and words, and embeds both words and topics in the same space. TSANL uses recurrent neural networks to capture long-range dependencies over topics and words. Since existing topic models demand time consuming learning and have poor scalability, both due to breaking the document?s structure such as order of words and topics, TSANL maintains the orders of words and topics as phrases and segments, respectively. TSANL reduces the calculation cost and required memory by feeding topic recurrent neural networks, and topic specific word networks with these embedding representations. Experiments show that TSANL maintains both segments and topical phrases, and so enhances previous models.
更多
查看译文
关键词
Word embeddings, Topic Models, Recurrent Neural Network, Neural Language Model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要