Improving Code-Switched Language Modeling Performance Using Cognate Features

INTERSPEECH(2019)

引用 2|浏览37
暂无评分
摘要
We have found that cognate words, defined as sets of words used in multiple languages that share a common etymology, can in fact elicit code-switching or language mixing between the languages. This paper focuses on how information about cognate words can improve language modeling performance of code-switched English-Spanish (EN-ES) language. We have found that the degree of semantic, phonetic or lexical overlap between a pair of cognate words is a useful feature in identifying code-switching in language. We derive a set of spelling, phonetic and semantic features from a list of of EN-ES cognates and run experiments on a corpus of conversational code-switched EN-ES. First, we show that there exists a strong statistical relationship between these cognate-based features and code-switching in the corpus. Secondly, we demonstrate that language models using these features obtain similar performance improvements as do other manually tagged features including language and part-of-speech tags. We conclude that cognate features can be a useful set of automatically-derived features that can be easily obtained for any pair of languages.
更多
查看译文
关键词
language modeling, code-switching, cognates
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要