Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation

INTERSPEECH(2019)

引用 0|浏览46
暂无评分
摘要
Audio tagging aims to identify the presence or absence of audio events in the audio clip. Recently, a lot of researchers have paid attention to explore different model structures to improve the performance of audio tagging. Convolutional neural network (CNN) is the most popular choice among a wide variety of model structures, and it's successfully applied to audio events prediction task. However, the model complexity of CNN is relatively high, which is not efficient enough to ship in real product. In this paper, compact Feedforward Sequential Memory Network (cFSMN) is proposed for audio tagging task. Experimental results show that cFSMN-based system yields a comparable performance with the CNN-based system. Meanwhile, an audio-to-audio ratio (AAR) based data augmentation method is proposed to further improve the classifier performance. Finally, with raw waveforms of the balanced training set of Audio Set which is a published standard database, our system can achieve a state-of-the-art performance with AUC being 0.932. Moreover, cFSMN-based model has only 1.9 million parameters, which is only about 1/30 of the CNN-based model.
更多
查看译文
关键词
Audio Set, audio tagging, compact feedforward sequential memory network, audio-to-audio ratio, data augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要