Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion

INTERSPEECH(2019)

引用 1|浏览10
暂无评分
摘要
Nearest Neighbor (NN)-based alignment techniques are popular in non-parallel Voice Conversion (VC). The performance of NN-based alignment improves with the information about phone boundary. However, estimating the exact phone boundary is a challenging task. If text corresponding to the utterance is available, the Hidden Markov Model (HMM) can be used to identify the phone boundaries. However, it requires a large amount of training data that is difficult to collect in realistic VC scenarios. Hence, we propose to exploit a Spectral Transition Measure (STM)-based alignment technique that does not require apriori training data. The idea behind STM is that neurons in the auditory or visual cortex respond strongly to the transitional stimuli compared to the steady-state stimuli. The phone boundaries estimated using the STM algorithm are then applied to the NN technique to obtain the aligned spectral features of the source and target speakers. Proposed STM+NN alignment technique is giving on an average 13.67% relative improvement in phonetic accuracy (PA) compared to the NN-based alignment technique. The improvement in %PA after alignment has positively reflected in the better performance in terms of speech quality and speaker similarity (in particular, a relative improvement of 13.63% and 13.26%, respectively) of the converted voice.
更多
查看译文
关键词
Voice Conversion, Spectral Transition Measure, INCA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要