Significance of Spectral Cues in Automatic Speech Segmentation for Indian Language Speech Synthesizers
Speech Communication(2020)
摘要
Building speech synthesis systems for Indian languages is challenging owing to the fact that digital resources for these languages are hardly available. Vocabulary independent speech synthesis requires that a given text is split at the level of the smallest sound unit, namely, phone. The waveforms or models of phones are concatenated to produce speech. The waveforms corresponding to that of the phones are obtained manual (listening and marking) when digital resources are scarce. But the manual labeling of speech data (also known as speech segmentation) can lead to inconsistencies as the duration of phones can be as short as 10ms.
更多查看译文
关键词
Speech segmentation,Signal processing cues,Short-term energy,Sub-band spectral flux,Hidden markov model,Gaussian mixture model,Deep neural network,Convolutional neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络