Ensemble of Deep Neural Network Models for MOS Prediction

Marie Kunešová,Jindřich Matoušek,Jan Lehečka,Jan Švec,Josef Michálek,Daniel Tihelka,Martin Bulín,Zdeněk Hanzlíček,Markéta Řezáčková

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 1|浏览3

暂无评分

摘要

Automatic evaluation of the quality of synthetic speech has the potential to serve as a cheaper and less time-consuming alternative to standard listening tests. In this paper, we present our contribution to the ongoing research: a system for automatic prediction of the mean opinion score (MOS) given by human listeners. The system was specifically developed for the recent VoiceMOS Challenge. Following the success of fusion systems in similar challenges, our contribution is an ensemble that interpolates the outputs of seven different models: four different wav2vec models, a CNN-RNN model, QuartzNet, and the LDNet baseline. During the VoiceMOS challenge, our system achieved the second-best utterance-level MSE of 0.171 and ranged from 2nd to 8th place among all 22 participating teams in terms of other evaluation metrics.

查看译文

关键词

MOS prediction,speech quality assessment,speech synthesis,mean opinion score

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要