AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
A brand-new method is provided in our research on tiny dataset training including data augmentation in music tune transposition and MIDI sequence truncation and prevention of over-fitting
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
Algorithms based on deep learning have been widely put forward for automatic music generated. However, few objective approaches have been proposed to assess whether a melody was created by automatons or Homo sapiens. Conference of Sound and Music Technology (2020) provides us a great opportunity to cope with the problem. In this paper, ...More
PPT (Upload PPT)
- Methods based on machine learning have been widely proposed for automatic music generated, especially since significant progress in deep learning fields.
- More and more melody can be composed by a deep-learning automaton, using the pitch and length of the notes in human music as a primary inputs to mimic human [1,2,3,4].
- Finding a relatively common and objective way to evaluate the way melodies are produced in various musical styles can make different music task comparable.
- The purpose of this study is to find an objective and effective method to generate the indicator value of whether a melody is human-composed by analyzing the AI-made melodies
- Methods based on machine learning have been widely proposed for automatic music generated, especially since significant progress in deep learning fields
- Random insert and delete run a high risk in this case, so we proposed two methods to do data augmentation
- The probability pi of the ith masked note is predicted by the trained A Lite BERT (ALBERT), and the average probability of all notes is the probability that this data is composed by AI
- A brand-new method is provided in our research on tiny dataset training including data augmentation in music tune transposition and MIDI sequence truncation and prevention of over-fitting
- A approach based on mask language model with ALBERT has been put forward to distinguish whether the composer of a music piece is human or not
- The pipeline is shown in Fig.2 as follows.
- The training set will undergo a data preprocessing part and be expanded by data augmentation.
- The details is available in the following subsections.
- A Masked Language Model (MLM) task based on ALBERT is trained for autoencoder on the expanded training set.
- The trained model will be used for evaluation.
- See Section 4 for details of training and evaluating
- For a pitch sequence, each note will be masked successively.
- The probability pi of the ith masked note is predicted by the trained ALBERT, and the average probability of all notes is the probability that this data is composed by AI.
- The number of notes in this pitch sequence is n suggests the probability of AI generating is as follows: p= n i=1 pi n.
- The probability of each data created by human, which this task required, can be obtained by 1 − p
- A brand-new method is provided in the research on tiny dataset training including data augmentation in music tune transposition and MIDI sequence truncation and prevention of over-fitting.
- Because of the well behavior of the pre-trained model, the authors believe that it worth more extensive application in MIDI sequences encoding tasks.
- Due to the limited computing resources, there is no experiment on whether the model can run well with a larger batch size or on the larger transformer, which worth more attention
- Each time, about 15% of the elements has been randomly masked in a pitch sequence, and then use the other elements not masked to predict the elements that have been masked
- Li Z, Li S. A comparison of melody created by artificial intelligence and human based on mathematical model[C]// Springer. Proceedings of the 7th Conference on Sound and Music Technology (CSMT). [S.l.]: Springer, 2020: 121–130.
- Liu C H, Ting C K. Computational intelligence in music composition: A survey[J]. IEEE Transactions on Emerging Topics in Computational Intelligence. 2016, 1(1):2–15.
- Dong H W, Hsiao W Y, Yang L C, et al. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment[J]. arXiv preprint arXiv:1709.06298. 2017.
- Wu C L, Liu C H, Ting C K. A novel genetic algorithm considering measures and phrases for generating melody[C]// IEEE. 2014 IEEE Congress on Evolutionary Computation (CEC). [S.l.]: IEEE, 2014: 2101–2107.
- Ren I Y. Using shannon entropy to evaluate automatic music generation systems: A case study of bach’s chorales[J]. ECE Department, University of Rochester. 2015.
- Liang F T, Gotham M, Johnson M, et al. Automatic stylistic composition of bach chorales with deep lstm.[C]. ISMIR. [S.l.], 2017: 449–456.
- Chu H, Urtasun R, Fidler S. Song from pi: A musically plausible network for pop music generation[J]. arXiv preprint arXiv:1611.03472016.
- Huang A, Wu R. Deep learning for music[J]. arXiv preprint arXiv:1606.04930. 2016.
- Unehara M, Onisawa T. Composition of music using human evaluation[C]// IEEE. 10th IEEE International Conference on Fuzzy Systems.(Cat. No. 01CH37297): volume 3. [S.l.]: IEEE, 2001: 1203–1206.
- Maeda Y, Kajihara Y. Automatic generation method of twelve tone row for musical composition used genetic algorithm[C]// IEEE. 2009 IEEE International Conference on Fuzzy Systems. [S.l.]: IEEE, 2009: 963–968.
- Pollastri E, Simoncelli G. Classification of melodies by composer with hidden markov models[C]// IEEE. Proceedings First International Conference on WEB Delivering of Music. WEDELMUSIC 2001. [S.l.]: IEEE, 2001: 88–95.
- Ogihara M, Li T. N-gram chord profiles for composer style representation.[C]. ISMIR. [S.l.], 2008: 671–676.
- Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805. 2018.
- Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365. 2018.
- Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[M]. [S.l.]: [s.n.], 2018.
- Liu A T, Yang S w, Chi P H, et al. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders[C]// IEEE. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [S.l.]: IEEE, 2020: 6419–6423.
- Jiang D, Lei X, Li W, et al. Improving transformer-based speech recognition using unsupervised pre-training[J]. arXiv preprint arXiv:1910.09932. 2019.
- Ling S, Liu Y, Salazar J, et al. Deep contextualized acoustic representations for semisupervised speech recognition[C]// IEEE. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [S.l.]: IEEE, 2020: 6429–6433.
- Baskar M K, Watanabe S, Astudillo R, et al. Semi-supervised sequence-to-sequence asr using unpaired speech and text[J]. arXiv preprint arXiv:1905.01152. 2019.
- Schneider S, Baevski A, Collobert R, et al. wav2vec: Unsupervised pre-training for speech recognition[J]. arXiv preprint arXiv:1904.05862. 2019.
- Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942. 2019.
- Chi P H, Chung P H, Wu T H, et al. Audio albert: A lite bert for self-supervised learning of audio representation[J]. arXiv preprint arXiv:2005.08575. 2020.
- Kim Y E, Chai W, Garcia R, et al. Analysis of a contour-based representation for melody.[C]. ISMIR. [S.l.], 2000.
- Wei J, Zou K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks[J]. arXiv preprint arXiv:1901.11196. 2019.
- Lee J, Lee Y, Kim J, et al. Set transformer: A framework for attention-based permutationinvariant neural networks[C]// PMLR. International Conference on Machine Learning. [S.l.]: PMLR, 2019: 3744–3753.
- Ishida T, Yamane I, Sakai T, et al. Do we need zero training loss after achieving zero training error?[J]. arXiv preprint arXiv:2002.08709. 2020.
- Raffel C, Ellis D P. Intuitive analysis, creation and manipulation of midi data with pretty midi[C]. 15th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers. [S.l.], 2014: 84–93.
- Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[C]. Advances in neural information processing systems. [S.l.], 2019: 8026– 8037.
- Wolf T, Debut L, Sanh V, et al. Huggingface’s transformers: State-of-the-art natural language processing[J]. ArXiv. 2019, abs/1910.03771.
- Loshchilov I, Hutter F. Decoupled weight decay regularization[J]. arXiv preprint arXiv:1711.05101. 2017.