Residual networks for text-independent speaker identification: Unleashing the power of residual learning

Pooja Gambhir,Amita Dev,Poonam Bansal,Deepak Kumar Sharma,Deepak Gupta

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS（2024）

引用 0|浏览3

暂无评分

摘要

The human voice, a dynamic signal, conveys valuable information for speaker identification, encompassing gender, age, emotions, and language. In the biometrics industry, identifying voices in real-time amidst diverse accents, tones, and noisy backgrounds is a challenging task. Voice biometry, a complex aspect of speaker identification, is gaining importance in various applications, such as user authentication, attendance systems, forensics, and banking operations, as it eliminates the need for traditional credentials like cards or passwords. Recent advancements in Human-Computer Interaction technology have made conversational tasks technically feasible. Deep Neural Learning approaches, especially Convolutional Deep Neural Networks (CDNN), have emerged as a powerful tool in the field of speech processing, surpassing traditional Speaker Identification methods. This paper introduces a novel approach using 1-Dimensional Convolutional Residual Blocks for audio classification and Speaker Identification, specifically focusing on speaker recognition from spoken Hindi language. The proposed Residual architecture significantly enhances speaker identification, even in low Signal Noise Ratio environments, achieving an impressive accuracy rate of 86.02%. This outperforms traditional Gaussian Mixture Model (GMM) and Feed Forward Back-propagation Network (FFBN) model for the same set of speakers. Future research directions may explore the classification of audio and speaker identification using various acoustic features derived from speech signals.

查看译文

关键词

Speaker identification,Voice pattern,Resnet,Spectrograms

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要