R-Vectors: New Technique for Adaptation to Room Acoustics

Yuri Khokhlov,Alexander Zatvornitskiy,Ivan Medennikov,Ivan Sorokin,Tatiana Prisyach,Aleksei Romanenko,Anton Mitrofanov,Vladimir Bataev,Andrei Andrusenko,Mariya Korenevskaya,Oleg Petrov

INTERSPEECH（2019）

Cited 12|Views27

No score

Abstract

Distant speech recognition is an important problem which is far from being solved. Reverberation and noise are in the list of main problems in this area. The most popular methods of dealing with them are data augmentation and speech enhancement. In this paper, we propose a novel approach, inspired by modern methods of speaker adaptation. First of all, a feed-forward network is trained to classify room impulse responses (RIRs) from speech recordings. Then this network is used for extracting embeddings, which we call R-vectors. These R-vectors are appended to input features of the acoustic model. Due to the lack of labeled data for RIRs classification task, we propose a self-supervised method of training the network, which consists of using artificial audio generated by room simulator. Experimental evaluation was conducted on VOiCES19 and AMI single-channel tasks as well as CHiME5 multi-channel task. It is shown that the R-vector-adapted ASR systems achieve up to 14% relative WER reduction. Furthermore, it is additive with gains from state-of-the-art dereverberation (WPE) and speaker adaptation (x-vector) techniques.

Translated text

Key words

R-vectors, distant ASR, room acoustics adaptation, VOiCES19 Challenge, CHiME5 challenge, AMI

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined