Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览2
暂无评分
摘要
The recently developed pitch-controllable text-to-speech (TTS) model, i.e. FastPitch, was conditioned for the pitch contours. However, the quality of the synthesized speech degraded considerably for pitch values that deviated significantly from the average pitch; i.e. the ability to control pitch was limited. To address this issue, we propose two algorithms to improve the robustness of FastPitch. First, we propose a novel timbre-preserving pitch-shifting algorithm for natural pitch augmentation. Pitch-shifted speech samples sound more natural when using the proposed algorithm because the speaker's vocal timbre is maintained. Moreover, we propose a training algorithm that defines FastPitch using pitch-augmented speech datasets with different pitch ranges for the same sentence. The experimental results demonstrate that the proposed algorithms improve the pitch controllability of FastPitch.
更多
查看译文
关键词
timbre-preserving pitch-shifting algorithm, pitch augmentation, text-to-speech, FastPitch, VocGAN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要