Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

Sterpu George
Sterpu George
Saam Christian
Saam Christian

INTERSPEECH, pp. 3506-3509, 2020.

Cited by: 0|Views0
EI

Abstract:

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset. Performance improvements range between 7% and 30% depending on the noise level when leveraging the visual modality of speech in addition to the auditory one. This work...More

Code:

Data:

Your rating :
0

 

Tags
Comments