A Study of Intentional Voice Modifications for Evading Automatic Speaker Recognition


引用 28|浏览21
We investigate the effect of intentional voice modi fications on a state-of-the-art speaker recognition system. The investigation includes data collection, where norma l and changed voices are collected from subjects conversi ng by telephone. For comparison purposes, it also include s an evaluation framework similar to that for NIST exten ded-data speaker recognition. Results show that the state-of -the-art system gives nearly perfect recognition performance in a clean condition using normal voices. Using the thre shold from this condition, it falsely rejects 39% of subj ects who change their voices during testing. However, this c an be improved to 9% if a threshold from the changed-voice testing condition is used. We also compare machine performance with human performance from a pilot listening exper iment. Results show that machine performance is comparable to human performance when normal voices are used for both training and testing. However, the machine outperfo rms humans when changed voices are used for testing. I n general, the results show vulnerability in both humans and s peaker recognition systems to changed voices, and suggest a potential for collaboration between human analysts and automatic speaker recognition systems to address th is phenomenon.
speaker recognition,data collection,testing,speech recognition,human performance,nist,degradation,communication channels
AI 理解论文
Chat Paper