Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction

Hailong Cao,Tiejun Zhao,Weixuan Wang,Wei Peng

Inf. Fusion（2023）

引用 2|浏览19

暂无评分

摘要

Great progress has been made in unsupervised bilingual lexicon induction (UBLI) by aligning the source and target word embeddings independently trained on monolingual corpora. The common assumption of most UBLI models is that the embedding spaces of two languages are approximately isomorphic (i.e., similar in geometric structure). Therefore, the performance is bound by the degree of isomorphism, especially on etymologically and typologically distant languages. Near-zero UBLI results have been reported for them. To address this problem, we propose a method to increase the isomorphism based on bilingual word embedding fusion. In particular, the features from the source embeddings are integrated into the target embeddings, and vice versa. Therefore, the resulting structures of source and target embeddings are similar to each other. The method does not require any form of supervision and can be applied to any language pair. On a benchmark dataset of bilingual lexicon induction, our approach can achieve competitive or superior performance compared to the state-of-the-art methods, with particularly strong results being found on distant languages.

查看译文

关键词

Unsupervised learning,Word translation,Unsupervised bilingual lexicon induction,Embedding fusion,Information fusion

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要