Singing Voice Conversion Using Posted Waveform Data on Music Social Media.

Koki Senda,Yukiya Hono,Kei Sawada,Kei Hashimoto,Keiichiro Oura,Yoshihiko Nankaku,Keiichi Tokuda

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference（2018）

引用 1|浏览25

暂无评分

摘要

This paper proposes a method of selecting training data for many-to-one singing voice conversion (VC) from waveform data on the social media music app "nana." On this social media app, users can share sounds such as speaking, singing, and instrumental music recorded by their smartphones. The number of hours of accumulated waveform data has exceeded one million, and it is regarded as "big data." It is widely known that big data can create huge values by advanced deep learning technology. A lot of post data of multiple users having sung the same song is contained in nana's database. This data is considered suitable training data for VC. This is because VC frameworks based on statistical approaches often require parallel data sets that consist of pairs of waveform data of source and target singers who sing the same phrases. The proposed method can compose parallel data sets that can be used for many-to-one statistical VCs from nana's database by extracting frames that have small differences in the timing of utterances, based on the results of dynamic programming (DP) matching. Experimental results indicate that a system that uses training data composed by our method can convert acoustic features more accurately than a system that does not use the method.

查看译文

关键词

singing voice conversion,social media music app nana,social media app,instrumental music,accumulated waveform data,advanced deep learning technology,post data,suitable training data,VC frameworks,parallel data sets,posted waveform data,music social media,Big Data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要