Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
This study proposes a multi-task pseudo-label learning (MPL)-based
non-intrusive speech quality assessment model called MTQ-Net. MPL consists of
two stages: obtaining pseudo-label scores from a pretrained model and
performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS),
Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The
pretrained MOSA-Net model is utilized to estimate three pseudo labels:
perceptual evaluation of speech quality (PESQ), short-time objective
intelligibility (STOI), and speech distortion index (SDI). Multi-task learning
is then employed to train MTQ-Net by combining a supervised loss (derived from
the difference between the estimated score and the ground-truth label) and a
semi-supervised loss (derived from the difference between the estimated score
and the pseudo label), where the Huber loss is employed as the loss function.
Experimental results first demonstrate the advantages of MPL compared to
training a model from scratch and using a direct knowledge transfer mechanism.
Second, the benefit of the Huber loss for improving the predictive ability of
MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher
overall predictive power compared to other SSL-based speech assessment models.
更多查看译文
关键词
3QUEST,PESQ,STOI,SDI,speech quality prediction,speech intelligibility prediction,self-supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要