A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance.

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION（2016）

引用 21|浏览8

暂无评分

摘要

Current state-of-the-art speech synthesizers for domain-independent systems still struggle with the challenge of generating understandable and natural-sounding speech. This is mainly because the pronunciation of words of foreign origin, inflections and compound words often cannot be handled by rules. Furthermore there are too many of these for inclusion in exception dictionaries. We describe an approach to evaluating text-to-speech synthesizers with a subjective listening experiment. The focus is to differentiate between known problem classes for speech synthesizers. The target language is German but we believe that many of the described phenomena are not language specific. We distinguish the following problem categories: Normalization, Foreign linguistics, Natural writing, Language specific and General. Each of them is divided into five to three problem classes. Word lists for each of the above mentioned categories were compiled and synthesized by both a commercial and an open source synthesizer, both being based on the non-uniform unit-selection approach. The synthesized speech was evaluated by human judges using the Speechalyzer toolkit and the results are discussed. It shows that, as expected, the commercial synthesizer performs much better than the open-source one, and especially words of foreign origin were pronounced badly by both systems.

查看译文

关键词

text-to-speech,evaluation,problem-classes

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要