Joint Singing Voice Separation And F0 Estimation With Deep U-Net Architectures

Andreas Jansson,Rachel M. Bittner,Sebastian Ewert,Tillman Weyde

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)（2019）

引用 8|浏览7

暂无评分

摘要

Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous tiff these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f(0) estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f(0) estimation for many real-world cases.

查看译文

关键词

music, voice, singing, fundamental frequency estimation, pitch, melody, source separation, multitask learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要