
Instance-Based Transfer Learning Approach for Vietnamese Speech Synthesis with Very Low Resource

Lam Tuong Q., Nguyen Dung D.,Nguyen Dat T., Lam Han K., Cai Thuc H.,Hoang Suong N.,Do Hao D.

Advances in Information and Communication(2022)

引用 1|浏览3
This paper proposes a method using the Instance-based transfer learning approach to build a Vietnamese speech synthesis system based on a target voice 45 times smaller than source voice. By using the correlation features between data objects, we can reuse the entire model or part of the previously trained weights to retrain with the new data set. Our model consists of two stages: (1) Training a DC-TTS model on the initial voice with a large data set and (2) Apply the Instance-based Transfer Learning approach to the previously trained model to generate new voices with a small amount of recorded data. After training with only 320 sentences (about 1 h), the model will be able to generate new voices with high quality. Our method can help to decrease data voice for training significantly and build a new speech synthesis system quickly. The MOS score of synthesized voice is approximately the same as that of the Tacotron 2 model, but with a much smaller amount of training data. This proves that our method is highly feasible to build a Vietnamese speech synthesis system in the case of limited data.
Speech synthesis, Text-to-Speech, Transfer learning, Sequence-to-sequence, Vietnamese
AI 理解论文
Chat Paper