Bi-Directional Hybrid Deep Learning model for Speaker Iden-tification

Wondimu Lambamo,Ramasamy Srinivasagan, Worku Jifara,Ali Alzahrani

International Journal of Advanced Science and Computer Applications(2023)

引用 0|浏览0
暂无评分
摘要
Speaker identification is the process of automatically determining who is speaking from the known speakers by the model. It is crucial in voice-based authentication, forensic investigations, security and surveillance. In recent studies, the combination of convolutional neural network (CNN) and recurrent neural network (RNN) variants performed better than separate models of both. However, only limited studies are conducted in speaker identification using a combination of CNN and RNN variants. In this study, we proposed speaker identification using hybrid two-dimensional CNN (2DCNN) and bidirectional gated recurrent unit (BiGRU) to improve performance. The proposed model integrates the advantage of 2DCNN and BiGRU layers to improve the performance of the model. 2DCNN layers have the advantage of extracting short-term spatial features from input data and it has a limited number of parameters for computation. BiGRU layers have an advantage in extracting long-term temporal dependency between the features in both directions (i.e. backward and forward) and it is efficient in achieving convergence during training. Spectrograms of the speech were used as input in our proposed model because of the rich acoustic features of the speaker. To compare the performance of the proposed model, additional experiments are conducted using the models 2DCNN, CNN-LSTM, CNN-BiLSTM and CNN-GRU. The experiments were conducted on the VoxCeleb1 audio dataset, which consists of 153,516 utterances collected from the 1251 speakers. The accuracy, precision, recall and f1 score of the proposed model are 98.28%, 99.08%, 98.92% and 98.97% respectively. The proposed model was compared with the existing works to show the effectiveness of the proposed model. The experiment results and the comparison with the existing works show that the proposed model has higher performance than both existing works and other models experimented in this study.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要