Topically-Informed Bilingually-Constrained Recursive Autoencoders For Statistical Machine Translation

COMMUNICATIONS IN INFORMATION AND SYSTEMS(2018)

引用 0|浏览27
暂无评分
摘要
Learning high-quality phrase vector representations is one of important research topics in statistical machine translation (SMT). Towards phrase embeddings, most existing works mainly explore syntactic and semantic clues among internal words within phrases, which are however insufficient for representation learning due to the lack of context information. In this paper, we propose topically informed bilingually-constrained recursive autoencoders for SMT, which substantially extends the conventional bilingually constrained recursive autoencoders by exploiting latent topics in two ways. First, we introduce topical contexts to induce topical phrase embeddings. Second, word topic assignments from a latent topic model are leveraged to constrain the learning of word and topic embeddings, both of which form the base of the contextual phrase embedding learning in the proposed model. Experiment results on Chinese-English translation show that the proposed model significantly improves the translation quality on NIST test sets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要