Are Recent Deepfake Speech Generators Detectable?
Information Hiding and Multimedia Security Workshop(2024)
Abstract
Deep learning methods can generate high-quality synthetic speech which is perceptually indistinguishable from real human speech. Synthetic speech can be maliciously used for fraud. Synthetic speech detection methods have been proposed which perform well on ASVspoof2019 and ASVspoof2021 Datasets. These datasets consist of synthetic speech from conventional neural network speech generators. Recently, many voice cloning methods have been proposed which use diffusion models and generative adversarial networks for high-quality speech synthesis. In this work, we present a new synthetic speech dataset containing 25,000 synthetic speech signals for 11 distinct speakers, with a total duration of 52 hours. We have developed this dataset using 5 recent diffusion model-based synthetic speech generators. These generators can clone a speaker's voice from text using only a few minutes of their real speech. We evaluate 6 of the best synthetic speech detectors that work well on the ASVspoof2019 Dataset on this new dataset, and demonstrate their performance using Equal Error Rate (EER).
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined