Training Code-Switching Language Model with Monolingual Data

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING（2020）

引用 11|浏览33

暂无评分

摘要

Lack of code-switching data is an issue of training codeswitching language model. In this paper, we propose an approach to train code-switching language models with monolingual data only. By constraining and normalizing output projection matrix in RNN based language model, we make the embeddings of different languages close to each other. With the numerical and visualized results, we show that the proposed approaches remarkably improve the code-switching language modeling trained from monolingual data. The proposed approaches are comparable or even better than training code-switching language model with artificially generated code-switching data. Furthermore, we use unsupervised bilingual word translation to analyze if semantically equivalent words in different languages are mapped together.

查看译文

关键词

Code-Switching, Language Model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要