Scale And Shift Invariant Time/Frequency Representation Using Auditory Statistics: Application To Rhythm Description

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)(2016)

引用 10|浏览8
暂无评分
摘要
In this paper we propose two novel scale and shift-invariant time-frequency representations of the audio content. Scaleinvariance is a desired property to describe the rhythm of an audio signal as it will allow to obtain the same representations for same rhythms played at different tempi. This property can be achieved by expressing the time-axis in log-scale, for example using the Scale Transform (ST). Since the frequency locations of the audio content are also important, we previously extended the ST to the Modulation Scale Spectrum (MSS). However, this MSS does not allow to represent the inter-relationship between the audio content existing in various frequency bands. To solve this issue, we propose here two novel representations. The first one is based on the 2D Scale Transform, the second on statistics (inspired by the auditory experiments of McDermott) that represent the interrelationship between the various frequency bands. We apply both representations to a task of rhythm class recognition and demonstrates their benefits. We show that the introduction of auditory statistics allows a large increase of the recognition results.
更多
查看译文
关键词
2D-Fourier,2D-Scale,Fourier-Mellin Transform,auditory statistics,rhythm description
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要