Analysis Of Acoustic-To-Articulatory Speech Inversion Across Different Accents And Languages
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION(2017)
摘要
The focus of this paper is estimating articulatory movements of the tongue and lips from acoustic speech data. While there are several potential applications of such a method in speech therapy and pronunciation training, performance of such acoustic-to-articulatory inversion systems is not very high due to limited availability of simultaneous acoustic and articulatory data, substantial speaker variability, and variable methods of data collection. This paper therefore evaluates the impact of speaker, language and accent variability on the performance of an acoustic-to-articulatory speech inversion system. The articulatory dataset used in this study consists of 21 Dutch speakers reading Dutch and English words and sentences, and 22 UK English speakers reading English words and sentences. We trained several acoustic-to-articulatory speech inversion systems both based on deep and shallow neural network architectures in order to estimate electromagnetic articulography (EMA) sensor positions, as well as vocal tract variables (TVs). Our results show that with appropriate feature and target normalization, a speaker-independent speech inversion system trained on data from one language is able to estimate sensor positions (or TVs) for the same language correlating at about r = 0.53 with the actual sensor positions (or TVs). Cross-language results show a reduced performance of r = 0.47.
更多查看译文
关键词
Acoustic-to-articulatory speech inversion, Electromagnetic articulography, Tract variables, Cross-accent speech inversion, Cross domain speech inversion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络