Towards A Linear Dynamical Model Based Speech Synthesizer

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 24|浏览46
暂无评分
摘要
We present recent developments towards building a speech synthesis system completely based on Linear Dynamical Models (LDMs). Specifically, we describe a decision tree-based context clustering approach to LDM-based speech synthesis and an algorithm for parameter generation using global variance with LDMs. In order to capture the speech dynamics, LDMs need coarser phoneme segmentation than the 5-state segmentation usually used in Hidden Markov Model (HMM)-based speech synthesis. Therefore, using LDMs to evaluate the clustering of longer phoneme segments improves the linguistic-to-acoustic mapping and leads to trajectories of synthetic speech parameters without discontinuities and closer to the natural ones. It also decreases the footprint of the system since the total number of decision tree leaves is smaller than the total number of leaves usually produced in a typical HMM-based synthesizer. On the other hand, global variance greatly improves the naturalness of the synthesized speech. According to subjective evaluation, the proposed LDM-based system with only 25% of the parameters of a baseline HMM-based synthesizer is able to produce synthetic speech of similar quality.
更多
查看译文
关键词
Statistical parametric speech synthesis, Linear dynamical models, Decision tree-based clustering, Global variance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要