Voice Morphing That Improves Tts Quality Using An Optimal Dynamic Frequency Warping-And-Weighting Transform

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2016）

引用 10|浏览40

暂无评分

摘要

Dynamic Frequency Warping (DFW) is widely used to align spectra of different speakers. It has long been argued that frequency warping captures inter-speaker differences but DFW practice always involves a tricky preprocessing part to remove spectral tilt. The DFW residual is successfully used in Voice Morphing to improve the quality and the similarity of synthesized speech but the estimation of the DFW residual remains largely heuristic and sub-optimal. This paper presents a dynamic programming algorithm that simultaneously estimates the Optimal Frequency Warping and Weighting transform (ODFWW) and therefore needs no preprocessing step and fine-tuning while source/target-speaker data are matched using the Matching-Minimization algorithm [1]. The transform is used to morph the output of a state-of-the-art Vocaine-based [2] TTS synthesizer in order to generate different voices in runtime with only +8% computational overhead. Some morphed TTS voices exhibit significantly higher quality than the original one as morphing seems to "correct" the voice characteristics of the TTS voice.

查看译文

关键词

voice-morphing,voice-transformation,DFW,vocaine,matching-minimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要