Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices As Input Depth Of Convolutional Neural Networks

2017 AES INTERNATIONAL CONFERENCE ON SEMANTIC AUDIO(2017)

引用 23|浏览7
暂无评分
摘要
In this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the squaresub-matrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要