Improving Code-Switched Language Modeling Performance Using Cognate Features

INTERSPEECH（2019）

引用 2|浏览37

暂无评分

摘要

We have found that cognate words, defined as sets of words used in multiple languages that share a common etymology, can in fact elicit code-switching or language mixing between the languages. This paper focuses on how information about cognate words can improve language modeling performance of code-switched English-Spanish (EN-ES) language. We have found that the degree of semantic, phonetic or lexical overlap between a pair of cognate words is a useful feature in identifying code-switching in language. We derive a set of spelling, phonetic and semantic features from a list of of EN-ES cognates and run experiments on a corpus of conversational code-switched EN-ES. First, we show that there exists a strong statistical relationship between these cognate-based features and code-switching in the corpus. Secondly, we demonstrate that language models using these features obtain similar performance improvements as do other manually tagged features including language and part-of-speech tags. We conclude that cognate features can be a useful set of automatically-derived features that can be easily obtained for any pair of languages.

查看译文

关键词

language modeling, code-switching, cognates

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要