Machine Translation Based Data Augmentation For Cantonese Keyword Spotting

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 18|浏览56
暂无评分
摘要
This paper presents a method to improve a language model for a limited-resourced language using statistical machine translation from a related language to generate data for the target language. In this work, the machine translation model is trained on a corpus of parallel Mandarin-Cantonese subtitles and used to translate a large set of Mandarin conversational telephone transcripts to Cantonese, which has limited resources. The translated transcripts are used to train a more robust language model for speech recognition and for keyword search in Cantonese conversational telephone speech. This method enables the keyword search system to detect 1.5 times more out-of-vocabulary words, and achieve 1.7% absolute improvement on actual term-weighted value.
更多
查看译文
关键词
keyword spotting,data augmentation,language modelling,neural networks,low-resourced languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要