Transound: Hyper-head attention transformer for birds sound recognition

Ecological Informatics(2023)

引用 3|浏览0
暂无评分
摘要
Bird strikes in low-altitude areas can cause severe economic losses and endanger the lives of airline passengers. Thus, it is necessary to drive away the corresponding birds, which requires adequate and accurate identification of birds. In this paper, we propose an effective bird identification algorithm using a vision transformer (ViT) with hyper-head attention and a Mel frequency cepstral coefficient (MFCC) flow framework. The original sound signal is preprocessed by using preemphasis, framing, and windowing. Then, the designed MFCC flow, which includes discrete Fourier transform, Mel frequency filtering, and discrete cosine transform operations, is proposed to extract sound features, which are then normalized as a recognizable visual dataset that contains the visual feature and can be identified by subsequent visual feature networks. Next, the ViT with hyper-head attention is designed to encode visual features and accurately identify birds. Extensive experiments on two public datasets show that the proposed method performs satisfactorily. Compared with five recent state-of-the-art approaches, the proposed Transound method achieves average increments of 10.64%, 5.65%, 1.15%, 1.78%, and 1.51%.
更多
查看译文
关键词
Birds recognition,MFCC flow,Hyper-head attention,Recognizable visual feature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要