谷歌浏览器插件
订阅小程序
在清言上使用

The STC ASR System for the VOiCES from a Distance Challenge 2019

INTERSPEECH(2019)

引用 9|浏览17
暂无评分
摘要
This paper is a description of the Speech Technology Center (STC) automatic speech recognition (ASR) system for the "VOiCES from a Distance Challenge 2019". We participated in the Fixed condition of the ASR task, which means that the only training data available was an 80-hour subset of the LibriSpeech corpus. The main difficulty of the challenge is a mismatch between clean training data and distant noisy development/evaluation data. In order to tackle this, we applied room acoustics simulation and weighted prediction error (WPE) dereverberation. We also utilized well-known speaker adaptation using x-vector speaker embeddings, as well as novel room acoustics adaptation with R-vector room impulse response (RIR) embeddings. The system used a lattice-level combination of 6 acoustic models based on different pronunciation dictionaries and input features. N-best hypotheses were rescored with 3 neural network language models (NNLMs) trained on both words and sub-word units. NNLMs were also explored for out-of-vocabulary (OOV) words handling by means of artificial texts generation. The final system achieved Word Error Rate (WER) of 14.7% on the evaluation data, which is the best result in the challenge.
更多
查看译文
关键词
VOiCES19 Challenge, distant ASR, room simulation, speaker and room acoustics adaptation, R-vectors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要