A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules.

IEEE ACM Trans. Audio Speech Lang. Process.(2023)

引用 0|浏览15
暂无评分
摘要
The quality of raw audio waveform generated by a vocoder could affect various audio generative tasks. In recent years, the dominance of source-filter vocoders was greatly challenged by neural vocoders as the latter presents far superior synthesized audio quality. Meanwhile, neural vocoders introduced unprecedented limitations including low runtime efficiency as well as unstable pitch especially in those without explicit periodic excitation input, while these have never been a problem in source-filter vocoders. We present in this article a novel approach that takes the best from both parties. We start by an in-depth examination of every building block in WORLD – one of the best-performing source-filter vocoders based on plain signal processing algorithms, looking for ones that do not work well, and we replace them with small, lightweight and task-specific neural network models. We also rearranged the vocoding pipeline for a smoother collaboration between building blocks. Our objective and subjective evaluations demonstrate that our methods present competitive synthesized audio quality even when compared against neural vocoders at a much lower computational cost, while keeping spectral envelope acoustic feature, high pitch accuracy as in conventional source-filter vocoders.
更多
查看译文
关键词
high-fidelity,source-filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要