End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

Synnaeve Gabriel
Synnaeve Gabriel
Kahn Jacob
Kahn Jacob
Grave Edouard
Grave Edouard
Likhomanenko Tatiana
Likhomanenko Tatiana
Pratap Vineel
Pratap Vineel
Sriram Anuroop
Sriram Anuroop
Liptchinsky Vitaliy
Liptchinsky Vitaliy
Cited by: 60|Views30

Abstract:

We study ResNet-, Time-Depth Separable ConvNets-, and Transformer-based acoustic models, trained with CTC or Seq2Seq criterions. We perform experiments on the LibriSpeech dataset, with and without LM decoding, optionally with beam rescoring. We reach 5.18% WER with external language models for decoding and rescoring. Additionally, we le...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments