Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition

ACL(2007)

引用 31|浏览18
暂无评分
摘要
Speech recognition in many morphologi- cally rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This pa- per compares various vocabulary decompo- sition approaches to open vocabulary speech recognition, using Estonian speech recogni- tion as a benchmark. Comparisons are per- formed utilizing large models of 60000 lex- ical items and smaller vocabularies of 5000 items. A large vocabulary model based on a manually constructed morphological tag- ger is shown to give the lowest word er- ror rate, while the unsupervised morphol- ogy discovery method Morfessor Baseline gives marginally weaker results. Only the Morfessor-based approach is shown to ade- quately scale to smaller vocabulary sizes.
更多
查看译文
关键词
speech recognition,decomposition method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要