Chrome Extension
WeChat Mini Program
Use on ChatGLM

Extending Multilingual Speech Synthesis to 100+ Languages Without Transcribed Data

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 0|Views33
No score
Abstract
Collecting high-quality studio recordings of audio is challenging, which limits the language coverage of text-to-speech (TTS) systems. This paper proposes a framework for scaling a multilingual TTS model to 100+ languages using found data without supervision. The proposed framework combines speech-text encoder pretraining with unsupervised training using untranscribed speech and unspoken text data sources, thereby leveraging massively multilingual joint speech and text representation learning. Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10 just 15 minutes of transcribed, found data, we can reduce the intelligibility difference to 1 that match the ground-truth in several languages.
More
Translated text
Key words
End-to-End Speech Recognition,Spoken Dialogue Systems,Automatic Speech Recognition,Statistical Language Modeling,Semantic Processing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined