Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
ICMI, pp. 111-115, 2018.
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities,...More
PPT (Upload PPT)