Efficient Personalized Speech Enhancement Through Self-Supervised Learning

IEEE Journal of Selected Topics in Signal Processing(2022)

引用 4|浏览10
暂无评分
摘要
This work presents self-supervised learning methods for monaural speaker-specific (i.e., personalized) speech enhancement models. While general-purpose models must broadly address many speakers, personalized models can adapt to a particular speaker's voice, expecting to solve a narrower problem. Hence, personalization can achieve more optimal performance in addition to reducing computational complexity. However, naive personalization methods can inconveniently require clean speech from the target user, e.g., due to subpar recording conditions. To this end, we pose personalization as either a zero-shot task, in which no clean speech of the target speaker is used, or a few-shot learning task, which is to minimize the duration of the clean speech used for transfer learning. With this paper, we propose self-supervised learning methods as a solution to both zero- and few-shot personalization tasks. The proposed methods learn the personalized speech features from unlabeled data (i.e., in-the-wild noisy recordings from the target user) rather than from the clean sources. We investigate three different self-supervised learning mechanisms. We set up a pseudo speech enhancement problem as a pretext task, which pretrains the models to estimate noisy speech as if it were the clean target. Contrastive learning and data purification methods regularize the loss function of the pseudo enhancement problem, overcoming the limitations of learning from unlabeled data. We assess our methods by personalizing the well-known ConvTasNet architecture to twenty different target speakers. The results show that self-supervision-based personalization improves the original ConvTasNet's enhancement quality with fewer model parameters and less clean data from the target user.
更多
查看译文
关键词
Data efficiency,model complexity,personalized speech enhancement,self-supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要