The STC ASR System for the VOiCES from a Distance Challenge 2019

Ivan Medennikov,Yuri Y. Khokhlov,Aleksei Romanenko,Ivan Sorokin,Anton Mitrofanov,Vladimir Bataev,Andrei Andrusenko,Tatiana Prisyach,Mariya Korenevskaya,Oleg Petrov,Alexander Zatvornitskiy

INTERSPEECH（2019）

引用 9|浏览17

暂无评分

摘要

This paper is a description of the Speech Technology Center (STC) automatic speech recognition (ASR) system for the "VOiCES from a Distance Challenge 2019". We participated in the Fixed condition of the ASR task, which means that the only training data available was an 80-hour subset of the LibriSpeech corpus. The main difficulty of the challenge is a mismatch between clean training data and distant noisy development/evaluation data. In order to tackle this, we applied room acoustics simulation and weighted prediction error (WPE) dereverberation. We also utilized well-known speaker adaptation using x-vector speaker embeddings, as well as novel room acoustics adaptation with R-vector room impulse response (RIR) embeddings. The system used a lattice-level combination of 6 acoustic models based on different pronunciation dictionaries and input features. N-best hypotheses were rescored with 3 neural network language models (NNLMs) trained on both words and sub-word units. NNLMs were also explored for out-of-vocabulary (OOV) words handling by means of artificial texts generation. The final system achieved Word Error Rate (WER) of 14.7% on the evaluation data, which is the best result in the challenge.

查看译文

关键词

VOiCES19 Challenge, distant ASR, room simulation, speaker and room acoustics adaptation, R-vectors

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要