Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification

2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)(2021)

引用 0|浏览4
暂无评分
摘要
Recently, the convolutional recurrent neural network (CRNN) has been widely used in weakly labeled sound event detection (SED) and audio tagging (AT) tasks. However, it is possible that the information of frequency dimension is not well used in the existing network design, which may cause information loss or redundancy. We propose a frequency axis pooling method to further boost the representation power of CRNN. Based on the existing pooling functions, the frequency axis pooling is applied on the feature map before recurrent neural network (RNN) input in CRNN. Compared to frequency axis no-pooling method, our method assigns different weights to different frequency dimensions during compressing, which can better compress frequency information and reduce information redundancy. To evaluate the proposed method, three commonly used pooling functions on frequency axis are compared on the Dcase2017 task4 dataset. The experimental results show that reasonable compression of frequency information helps to improve the performance of AT and SED tasks significantly. Among them, the frequency axis pooling based on linear softmax performs the best on both tasks.
更多
查看译文
关键词
weakly labeled sound event detection,frequency dimension,existing network design,information loss,frequency axis pooling method,CRNN,existing pooling functions,recurrent neural network input,no-pooling method,method assigns different weights,different frequency dimensions,frequency information,information redundancy,commonly used pooling functions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要