Correction To: Two Sepedi‑english Code‑switched Speech Corpora

Language Resources and Evaluation（2022）

引用 0|浏览4

暂无评分

摘要

We report on the development of two reference corpora for the analysis of Sepedi-English code-switched speech in the context of automatic speech recognition. For the first corpus, possible English events were obtained from an existing corpus of transcribed Sepedi-English speech. The second corpus is based on the analysis of radio broadcasts: actual instances of code switching were transcribed and reproduced by a number of native Sepedi speakers. We describe the process to develop and verify both corpora and perform an initial analysis of the newly produced data sets. We find that, in naturally occurring speech, the frequency of code switching is unexpectedly high for this language pair, and that the continuum of code switching (from unmodified embedded words to loanwords absorbed into the matrix language) makes this a particularly challenging task for speech recognition systems.

查看译文

关键词

Code switching,Speech corpus,Multilingual speech recognition,Sepedi

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要