Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

ICASSP, pp. 6264-6268, 2020.

EI
Other Links: arxiv.org|academic.microsoft.com|dblp.uni-trier.de

Abstract:

This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution modeling of prosody by conditioning finer level representations on coarser level ones. Additionally, it imposes hierarchical conditioning across all latent dimens...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments