Spoofing Speech Detection Using High Dimensional Magnitude And Phase Features: The Ntu Approach For Asvspoof 2015 Challenge

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 134|浏览93
暂无评分
摘要
Recent improvement in text-to-speech (TTS) and voice conversion (VC) techniques presents a threat to automatic speaker verification (ASV) systems. An attacker can use the TTS or VC systems to impersonate a target speaker's voice. To overcome such a challenge, we study the detection of such synthetic speech (called spoofing speech) in this paper. We propose to use high dimensional magnitude and phase based features and long term temporal information for the task. In total, 2 types of magnitude based features and 5 types of phase based features are used. For each feature type, we build a component system using a multilayer perceptron to predict the posterior probabilities of the input features extracted from spoofing speech. The probabilities of all component systems are averaged to produce the score for final decision. When tested on the ASVspoof 2015 benchmarking task, an equal error rate (EER) of 0.29% is obtained for known spoofing types, which demonstrates the highly effectiveness of the 7 features used. For unknown spoofing types, the EER is much higher at 5.23%, suggesting that future research should be focused on improving the generalization of the techniques.
更多
查看译文
关键词
Spoofing attack, voice conversion, automatic speaker verification, phase feature, ASVspoof 2015
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要