谷歌浏览器插件
订阅小程序
在清言上使用

Phone Inventories and Recognition for Every Language.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览33
暂无评分
摘要
Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages. However, even the largest collection of phone inventory only covers about 2000 languages, which is only 1/4 of the total number of languages in the world. A majority of the remaining languages are endangered. In this work, we attempt to solve this problem by estimating the phone inventory for any language listed in Glottolog, which contains phylogenetic information regarding 8000 languages. In particular, we propose one probabilistic model and one non-probabilistic model, both using phylogenetic trees ("language family trees") to measure the distance between languages. We show that our best model outperforms baseline models by 6.5 F1. Furthermore, we demonstrate that, with the proposed inventories, the phone recognition model can be customized for every language in the set, which improved the PER (phone error rate) in phone recognition by 25%.
更多
查看译文
关键词
Phone Inventory, Multilingual Speech Recognition, Endangered Languages, Language Documentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要