Residual networks for text-independent speaker identification: Unleashing the power of residual learning

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS(2024)

引用 0|浏览3
暂无评分
摘要
The human voice, a dynamic signal, conveys valuable information for speaker identification, encompassing gender, age, emotions, and language. In the biometrics industry, identifying voices in real-time amidst diverse accents, tones, and noisy backgrounds is a challenging task. Voice biometry, a complex aspect of speaker identification, is gaining importance in various applications, such as user authentication, attendance systems, forensics, and banking operations, as it eliminates the need for traditional credentials like cards or passwords. Recent advancements in Human-Computer Interaction technology have made conversational tasks technically feasible. Deep Neural Learning approaches, especially Convolutional Deep Neural Networks (CDNN), have emerged as a powerful tool in the field of speech processing, surpassing traditional Speaker Identification methods. This paper introduces a novel approach using 1-Dimensional Convolutional Residual Blocks for audio classification and Speaker Identification, specifically focusing on speaker recognition from spoken Hindi language. The proposed Residual architecture significantly enhances speaker identification, even in low Signal Noise Ratio environments, achieving an impressive accuracy rate of 86.02%. This outperforms traditional Gaussian Mixture Model (GMM) and Feed Forward Back-propagation Network (FFBN) model for the same set of speakers. Future research directions may explore the classification of audio and speaker identification using various acoustic features derived from speech signals.
更多
查看译文
关键词
Speaker identification,Voice pattern,Resnet,Spectrograms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要