Improving linear orthogonal mapping based cross-lingual representation using ridge regression and graph centrality

Computer Speech & Language(2024)

引用 0|浏览0
暂无评分
摘要
Orthogonal linear mapping is a commonly used approach for generating cross-lingual embedding between two monolingual corpora that uses a word frequency-based seed dictionary alignment approach. While this approach is found to be effective for isomorphic language pairs, they do not perform well for distant language pairs with different sentence structures and morphological properties. For a distance language pair, the existing frequency-aligned orthogonal mapping methods suffer from two problems - (i)the frequency of source and target word are not comparable, and (ii)different word pairs in the seed dictionary may have different contribution. Motivated by the above two concerns, this paper proposes a novel centrality-aligned ridge regression-based orthogonal mapping. The proposed method uses centrality-based alignment for seed dictionary selection and ridge regression framework for incorporating influential weights of different word pairs in the seed dictionary. From various experimental observations over five language pairs (both isomorphic and distant languages), it is evident that the proposed method outperforms baseline methods in the Bilingual Dictionary Induction(BDI) task, Sentence Retrieval Task(SRT), and Machine Translation. Further, several analyses are also included to support the proposed method.
更多
查看译文
关键词
CLWE,Orthogonal linear mapping,Ridge regression,Centrality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要