AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have proposed SongMASS, an automatic song writing system for both lyric-to-melody and melodyto-lyric generation, which leverages masked sequence to sequence pre-training and attention-based alignment constraint

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

Cited by: 0|Views76
Full Text
Bibtex
Weibo

Abstract

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry. In automatic song writing, lyric-to-melody generation and melody-to-lyric generation are two important tasks, both of which usually suffer from the following challenges: 1) the paired lyric and mel...More

Code:

Data:

0
Introduction
  • Automatic song writing is an interesting and challenging task in both research and industry.
  • Previous works (Bao et al 2019; Li et al 2020; Watanabe et al 2018; Lee, Fang, and Ma 2019) on L2M and M2L have not considered the scenario of limited paired data, and only leverage some greedy decisions for lyric and melody alignment, which cannot well address these challenges.
  • The authors propose SongMASS, which uses sequence to sequence pre-training method to leverage the unpaired lyric and melody data, and attention-based alignment constraints for global and precise lyric-melody alignment
Highlights
  • Automatic song writing is an interesting and challenging task in both research and industry
  • We propose SongMASS, an automatic song writing system for L2M and M2L, which addresses the first challenge with masked sequence to sequence pre-training and the second challenge with attention based alignment constraint
  • The main results of the objective evaluation of lyric-tomelody and melody-to-lyric generations are shown in Table 1
  • We have proposed SongMASS, an automatic song writing system for both lyric-to-melody and melodyto-lyric generation, which leverages masked sequence to sequence pre-training and attention-based alignment constraint
  • We introduce the sentence-level and token-level alignment constraints, and a dynamic programming algorithm to obtain accurate alignments between lyric and melody
  • Experimental results show that our proposed SongMASS greatly improves the quality of lyric-to-melody and melody-to-lyric generation compared with the baseline
Methods
  • 3.1 System Overview

    The overall architecture of SongMASS for L2M and M2L is shown in Figure 2, which adopts the Transformer (Vaswani et al 2017) based encoder-decoder framework.
  • Pre-training Method The authors further investigate the effectiveness of each design in pre-training method, including using separate encoder-decoder for lyric-to-lyric and melodyto-melody pre-training and using supervised pre-training to learn a shared latent space between lyric and melody.
  • From Table 1, removing separate encoder-decoder and removing supervised loss both result in worse performance than SongMASS, which demonstrates the effectiveness of the two designs.
  • Alignment Strategy The authors study the effectiveness of the sentence-level and token-level alignment constraints on the alignment accuracy between melodies and lyrics.
  • The authors find that the alignment accuracy is drastically decreased without DP in Table 3, showing the importance of DP for accurate alignments
Results
  • 4.1 Experimental Setup

    Dataset Unpaired Lyric and Melody. The authors use “380,000+ lyrics from MetroLyrics” as the unpaired lyrics for pretraining, which contains 362,237 songs.
  • The subjective evaluations are shown in Table 2, from which the authors can see that the lyrics and melodies generated by SongMASS obtain better average scores in all subjective metrics.
  • These results demonstrate the effectiveness of SongMASS in generating high-quality lyric and melody.
  • As shown in Table 1, removing each component results in worse performance than SongMASS6, demonstrating the contribution of pre-training and alignment constraint
Conclusion
  • The authors have proposed SongMASS, an automatic song writing system for both lyric-to-melody and melodyto-lyric generation, which leverages masked sequence to sequence pre-training and attention-based alignment constraint.
  • Experimental results show that the proposed SongMASS greatly improves the quality of lyric-to-melody and melody-to-lyric generation compared with the baseline.
  • The authors will investigate other sequence to sequence pre-training methods and more advanced alignment algorithms for lyric-to-melody and melody-to-lyric generation
Tables
  • Table1: Results of lyric-to-melody and melody-to-lyric generation in objective evaluation
  • Table2: Subjective evaluation results. Average scores and standard deviations are shown for each measure
  • Table3: Analyses of the designs in alignment constraints
Download tables as Excel
Funding
  • This research was supported by the National Key Research And Development Program of China (No.2019YFB1405802)
Study subjects and analysis
participants with professional knowledge in music and singing: 5
We calculate the ratio of equals among all source tokens and all songs in the test set to obtain the alignment accuracy. Subjective Evaluation For subjective evaluation, we invite 5 participants with professional knowledge in music and singing as human annotators to evaluate 10 songs (338 pairs of generated lyric sentences and melody phrases) randomly selected from our test set. We require each annotator to answer some questions using a five-point scale, from 1 (Poor) to 5 (Perfect)

Reference
  • Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv e-prints abs/1409.0473.
    Findings
  • Bao, H.; Huang, S.; Wei, F.; Cui, L.; Wu, Y.; Tan, C.; Piao, S.; and Zhou, M. 2019. Neural Melody Composition from Lyrics. In NLPCC, volume 11838, 499–511.
    Google ScholarLocate open access versionFindings
  • Berndt, D. J.; and Clifford, J. 1994. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, 359–370.
    Google ScholarLocate open access versionFindings
  • Choi, K.; Fazekas, G.; and Sandler, M. B. 2016. Text-based LSTM networks for Automatic Music Composition. CoRR abs/1604.05358.
    Findings
  • Devlin, J.; Chang, M.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, 4171–4186.
    Google ScholarFindings
  • Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., ICLR.
    Google ScholarFindings
  • Lee, H.-P.; Fang, J.-S.; and Ma, W.-Y. 2019. iComposer: An Automatic Songwriting System for Chinese Popular Music. In NAACL, 84–88.
    Google ScholarLocate open access versionFindings
  • Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; and Zettlemoyer, L. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Li, P.; Zhang, H.; Liu, X.; and Shi, S. 2020. Rigid Formats Controlled Text Generation. In ACL, 742–751.
    Google ScholarLocate open access versionFindings
  • Lu, X.; Wang, J.; Zhuang, B.; Wang, S.; and Xiao, J. 2019. A Syllable-Structured, Contextually-Based Conditionally Generation of Chinese Lyrics. In PRICAI, volume 11672, 257–265.
    Google ScholarLocate open access versionFindings
  • Luong, T.; Pham, H.; and Manning, C. D. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP, 1412–1421.
    Google ScholarLocate open access versionFindings
  • Malmi, E.; Takala, P.; Toivonen, H.; Raiko, T.; and Gionis, A. 2015. DopeLearning: A Computational Approach to Rap Lyrics Generation. CoRR abs/1505.04771.
    Findings
  • Needleman, S. B.; and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443–453.
    Google ScholarLocate open access versionFindings
  • Radford, A.; Narasimhan, K.; Salimans, T.; and Sutskever, I. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI.
    Google ScholarFindings
  • Raffel, C. 2016. Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD Thesis.
    Google ScholarFindings
  • Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Ren, Y.; He, J.; Tan, X.; Qin, T.; Zhao, Z.; and Liu, T. 2020. PopMAG: Pop Music Accompaniment Generation. CoRR abs/2008.07703. URL https://arxiv.org/abs/2008.07703.
    Findings
  • Rush, A. M.; Chopra, S.; and Weston, J. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In EMNLP, 379–389.
    Google ScholarLocate open access versionFindings
  • Song, K.; Tan, X.; Qin, T.; Lu, J.; and Liu, T. 20MASS: Masked Sequence to Sequence Pre-training for Language Generation. In ICML, volume 97, 5926–5936.
    Google ScholarLocate open access versionFindings
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; and Polosukhin, I. 2017. Attention is All you Need. In NIPS, 5998–6008.
    Google ScholarLocate open access versionFindings
  • Watanabe, K.; Matsubayashi, Y.; Fukayama, S.; Goto, M.; Inui, K.; and Nakano, T. 2018. A Melody-Conditioned Lyrics Language Model. In NAACL, 163–172.
    Google ScholarLocate open access versionFindings
  • Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R. R.; and Le, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NIPS, 5753–5763.
    Google ScholarFindings
  • Yu, Y.; and Canales, S. 2019. Conditional LSTM-GAN for Melody Generation from Lyrics. CoRR abs/1908.05551.
    Findings
  • Zhu, H.; Liu, Q.; Yuan, N. J.; Qin, C.; Li, J.; Zhang, K.; Zhou, G.; Wei, F.; Xu, Y.; and Chen, E. 2018. XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music. In KDD, 2837–2846.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科