A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation.

INTERSPEECH（2020）

引用 10|浏览8

暂无评分

摘要

Polyphone disambiguation serves as an essential part of Mandarin text-to-speech (TTS) system. However, conventional system modelling the entire Pinyin set causes the case that prediction belongs to the unrelated polyphonic character instead of the current input one, which has negative impacts on TTS performance. To address this issue, we introduce a mask-based model for polyphone disambiguation. The model takes a mask vector extracted from the context as an extra input. In our model, the mask vector not only acts as a weighting factor in Weighted-softmax to prevent the case of mis-prediction but also eliminates the contribution of non-candidate set to the overall loss. Moreover, to mitigate the uneven distribution of pronunciation, we introduce a new loss called Modified Focal Loss. The experimental result shows the effectiveness of the proposed mask-based model. We also empirically studied the impact of Weighted-softmax and Modified Focal Loss. It was found that Weighted-softmax can effectively prevent the model from predicting outside the candidate set. Besides, Modified Focal Loss can reduce the adverse impacts of the uneven distribution of pronunciation.

查看译文

关键词

polyphone disambiguation, mask vector, Weighted-softmax, Modified Focal Loss

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要