谷歌浏览器插件
订阅小程序
在清言上使用

Sound Event Localization and Detection Using a Spatial Omni-Dimensional Dynamic Interactions Network

Tongyang Dao,Min Guo,Miao Ma

SIGNAL IMAGE AND VIDEO PROCESSING(2024)

引用 0|浏览7
暂无评分
摘要
To improve the performance of real sound event localization and detection (SELD), we developed a new architecture based on a spatial omni-dimensional dynamic interactions (SODI) network. The proposed new architecture (SODI-SELD) is mainly composed of ACSmixBlock, SODBlock, and ConformerBlock. ACSmixBlock mixes self-attention, convolution, and SoftPool to extract richer channel features. SODBlock extracts adjacent features using omni-dimensional adaptive gated convolution (ODAgConv) and implements higher-order spatial interactions in a recursive manner to extract deeper channel features. These two modules improve channel feature extraction in terms of depth and breadth, while ConformerBlock improves modeling capabilities. The whole SODI-SELD architecture can reduce the information loss of sound event downsampling by SoftPool and use multi-head attention to prevent training overfitting. Experimental results on a real dataset with a maximum overlap of five show that the SODI-SELD architecture outperforms the Baseline model, where the F_20^∘ (macro) and LR_CD metrics improve by 8.2% and 8.5% , respectively, and the LE_CD metric decreases by 6.6^∘ . The code is available at https://github.com/daotongyang/SODI-SELD.git .
更多
查看译文
关键词
Sound event localization and detection,Spatial omni-dimensional dynamic interactions,ConformerBlock,SoftPool
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要