Scanning dial: the instantaneous audio classification transformer

Huawei Jiang,Husna Mutahira, Unsang Park,Mannan Saeed Muhammad

Discover Applied Sciences（2024）

引用 0|浏览0

暂无评分

摘要

number of remarkable accomplishments have been achieved in the field of audio classification using algorithms based on Transformers in recent years. As addressed in the literature, sound classification commonly involves the analysis of audio recordings that are usually five seconds or longer in duration. This raises a secondary question: Can Transformers effectively classify extremely short audio samples? The main objective of this study is to utilize the Transformer model for sound classification, focusing on extremely brief audio clips, with an average sound duration of 1.24× 10^-2 seconds, which is too short for human recognition. In addition, a new filter is developed to obtain an instantaneous audio dataset. This filter is applied individually to the ESC-50, UrbanSound8K, AESDD, ReaLISED and RAVDESS datasets to obtain corresponding instantaneous datasets. Moreover, a new data augmentation technique is introduced with the objective of increasing classification accuracy. A comparative analysis between the proposed scheme and the mainstream data augmentation methods is performed on the instantaneous audio datasets, resulting in accuracy rates of 94.16

查看译文

关键词

Data augmentation,Instantaneous audio classification,Instantaneous audio dataset,Transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要