Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
ICASSP, pp. 6264-6268, 2020.
EI
Abstract:
This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution modeling of prosody by conditioning finer level representations on coarser level ones. Additionally, it imposes hierarchical conditioning across all latent dimens...More
Code:
Data:
Full Text
Tags
Comments