AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech.

Brian Patton,Yannis Agiomyrgiannakis,Michael Terry,Kevin W. Wilson,Rif A. Saurous,D. Sculley

arXiv: Computation and Language（2016）

引用 26|浏览29

暂无评分

摘要

Developers of text-to-speech synthesizers (TTS) often make use of human raters to assess the quality of synthesized speech. We demonstrate that we can model human ratersu0027 mean opinion scores (MOS) of synthesized speech using a deep recurrent neural network whose inputs consist solely of a raw waveform. Our best models provide utterance-level estimates of MOS only moderately inferior to sampled human ratings, as shown by Pearson and Spearman correlations. When multiple utterances are scored and averaged, a scenario common in synthesizer quality assessment, AutoMOS achieves correlations approaching those of human raters. The AutoMOS model has a number of applications, such as the ability to explore the parameter space of a speech synthesizer without requiring a human-in-the-loop.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要