Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet.

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(2018)

引用 3|浏览2
暂无评分
摘要
This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.
更多
查看译文
关键词
pitch features,contextual features,log-frequency axis,square error,cross-entropy loss,WaveNet,hidden Markov models,singing expressions,deep neural autoregressive model,musical score,musical notes,singing voice,continuous F0 contour,musical note sequences,sequential generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要