Deep Convolutional Neural Network With Scalogram For Audio Scene Modeling

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES(2018)

引用 23|浏览31
暂无评分
摘要
Deep learning has improved the performance of acoustic scene classification recently. However, learning is usually based on short-time Fourier transform and hand-tailored filters. Learning directly from raw signals has remained a big challenge. In this paper, we proposed an approach to learning audio scene patterns from scalogram, which is extracted from raw signal with simple wavelet transforms. The experiments were conducted on DCASE2016 dataset. We compared scalogram with classical Mel energy, which showed that multi-scale feature led to an obvious accuracy increase. The convolutional neural network integrated with maximum-average downsampled scalogram achieved an accuracy of 90.5% in the evaluation step in DCASE2016.
更多
查看译文
关键词
Acoustic scene classification, Scalogram, Convolutional neural network, DCASE2016
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要