LRRo: a lip reading data set for the under-resourced romanian language
MMSys '20: 11th ACM Multimedia Systems Conference Istanbul Turkey June, 2020(2020)
摘要
Automatic lip reading is a challenging and important research topic as it allows to transcript visual-only recordings of a speaker into editable text. There are many useful applications of such technology, starting from the aid of hearing impaired people, to improving general automatic speech recognition. In this paper, we introduce and release publicly lip reading resources for Romanian language. Two distinct collections are proposed: (i) wild LRRo data is designed for an Internet in-the-wild, ad-hoc scenario, coming with more than 35 different speakers, 1.1k words, a vocabulary of 21 words, and more than 20 hours; (ii) lab LRRo data, addresses a lab controlled scenario for more accurate data, coming with 19 different speakers, 6.4k words, a vocabulary of 48 words, and more than 5 hours. This is the first resource available for Romanian lip reading and would serve as a pioneering foundation for this under-resourced language. Nevertheless, given the fact that word-level models are not strongly language dependent, these resources will also contribute to the general lip-reading task via transfer learning. To provide a validation and reference for future developments, we propose two strong baselines via VGG-M and Inception-V4 state-of-the-art deep network architectures.
更多查看译文
关键词
visual speech recognition, lip reading, under-resourced languages, annotated data set, Romanian language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络