Sound Event Localization and Detection Using a Spatial Omni-Dimensional Dynamic Interactions Network

SIGNAL IMAGE AND VIDEO PROCESSING（2024）

引用 0|浏览7

暂无评分

摘要

To improve the performance of real sound event localization and detection (SELD), we developed a new architecture based on a spatial omni-dimensional dynamic interactions (SODI) network. The proposed new architecture (SODI-SELD) is mainly composed of ACSmixBlock, SODBlock, and ConformerBlock. ACSmixBlock mixes self-attention, convolution, and SoftPool to extract richer channel features. SODBlock extracts adjacent features using omni-dimensional adaptive gated convolution (ODAgConv) and implements higher-order spatial interactions in a recursive manner to extract deeper channel features. These two modules improve channel feature extraction in terms of depth and breadth, while ConformerBlock improves modeling capabilities. The whole SODI-SELD architecture can reduce the information loss of sound event downsampling by SoftPool and use multi-head attention to prevent training overfitting. Experimental results on a real dataset with a maximum overlap of five show that the SODI-SELD architecture outperforms the Baseline model, where the F_20^∘ (macro) and LR_CD metrics improve by 8.2% and 8.5% , respectively, and the LE_CD metric decreases by 6.6^∘ . The code is available at https://github.com/daotongyang/SODI-SELD.git .

查看译文

关键词

Sound event localization and detection,Spatial omni-dimensional dynamic interactions,ConformerBlock,SoftPool

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要