Spoofing Detection for Speaker Verification with Glottal Flow and 1D Pure Convolutional Networks.

Antonio Camarena-Ibarrola,Karina Figueroa, Axel Plancarte Curiel

MCPR（2023）

引用 0|浏览0

暂无评分

摘要

Automatic Speaker Verification Systems are subject to attacks, these attacks aim to fool the system into accepting as valid the identity of a speaker when in fact it is the audio produced either by a voice conversion system that actually turns the voice of an identity thief into the voice of his victim, or by a speech synthesizer which parameters have been tuned to produce the voice of a specific individual whose identity is attempting to be stolen. A spoof detector decides wether the speech signal that is being used for verifying the identity of an individual is genuine or spoof. We use a 1D Pure Convolutional Neural Network (1DPCNN) with two classes (genuine and spoof). “Pure Convolutional” means all the layers of the neural network are convolutional or pooling, there are no dense layers, so the classifier block is also made of convolutional layers with a Global Maxpooling strategy. From the Speech signal we detect the voiced segments, which are those produced while the vocal cords vibrate, from those voiced segments we extract the glottal flow which is a signal far less complex than the speech signal and that is known to vary between speakers. We tested our technique with the dataset from the ASVSpoof 2015 challenge, using 7000 audio files for training, 4000 audio files for validation, and 30,000 audio files for testing. We achieved an accuracy of 91.4% with the test set.

查看译文

关键词

speaker verification,glottal flow

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要