Melody generation based on deep ensemble learning using varying temporal context length

Multimedia Tools and Applications(2024)

引用 0|浏览6
暂无评分
摘要
Music has always been the most powerful medium to express human emotion and feeling which sometimes mere words cannot express. As a result generating music using machine and deep learning approaches have been quite popular for some time now. It is a very challenging and interesting task to do as imitating human creativity is not easy. This paper attempted to perform effective melody generation using sequential deep learning models particularly LSTMs (Long short-term memory). In this context, note that the previous works exhibit two principal limitations. Firstly, a significant majority of the studies rely on RNN variants that cannot effectively remember long past sequences. Secondly, they often don’t consider the varying temporal context lengths in melody generation during data modeling. In this work, experiments have been performed with different LSTM variants namely Vanilla LSTM, Multi-Layer LSTM, Bidirectional LSTM and different temporal context lengths for each of them to find out the optimal LSTM model and the optimal timestep for efficient melody generation. Moreover, ensembles of the best-performing techniques for each genre (e.g., classical, country, jazz, and pop) are implemented to see if we can generate even better melodies than the corresponding individual models. Finally, a qualitative evaluation is carried out for the generated melodies by conducting a survey that we circulated among fellow colleagues and within the ISMIR community and asked the participants to rate each audio on a scale of 1-5 which helped us in assessing the quality of our generated music samples. All the models have been validated on four datasets that we manually prepared based on their genres namely Classical, Jazz, Country and Pop.
更多
查看译文
关键词
Melody generation,Deep learning,LSTM,Ensemble learning,MIDI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要