A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis

Hyeonjoo Kang,Young-Sun Joo,Inseon Jang,Chunghyun Ahn,Hong-Goo Kang

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)（2019）

引用 1|浏览56

暂无评分

摘要

In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.

查看译文

关键词

vocoding-based statistical parametric speech synthesis,source-filter model,spectral frequency parameters,fundamental frequency parameters,reliable network architecture,F0 parameter suffers,multispeaker TTS systems,acoustic parameter selection strategies,deep learning-based speech synthesis system

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要