Cross-Lingual Text Similarity Exploiting Neural Machine Translation Models

JOURNAL OF INFORMATION SCIENCE(2021)

引用 12|浏览11
暂无评分
摘要
This article studies cross-lingual text similarity using neural machine translation models. A straightforward approach based on machine translation is to use translated text so as to make the problem monolingual. Another possible approach is to use intermediate states of machine translation models as recently proposed in the related work, which could avoid propagation of translation errors. We aim at improving both approaches independently and then combine the two types of information, that is, translations and intermediate states, in a learning-to-rank framework to compute cross-lingual text similarity. To evaluate the effectiveness and generalisability of our approach, we conduct empirical experiments on English-Japanese and English-Hindi translation corpora for a cross-lingual sentence retrieval task. It is demonstrated that our approach using translations and intermediate states outperforms other neural network-based approaches and is even comparable with a strong baseline based on a state-of-the-art machine translation system.
更多
查看译文
关键词
Cross-lingual information retrieval, distributed representation, document similarity, neural network, sentence embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要