Facial Expression Recognition Through Cross-Modality Attention Fusion

IEEE Transactions on Cognitive and Developmental Systems(2023)

Cited 6|Views41
No score
Facial expressions are generally recognized based on handcrafted and deep-learning-based features extracted from RGB facial images. However, such recognition methods suffer from illumination/pose variations. In particular, they fail to recognize these expressions with weak emotion intensities. In this work, we propose a cross-modality attention-based convolutional neural network (CM-CNN) for facial expression recognition. We extract expression-related features from complementary facial images (gray-scale, local binary pattern, and depth images) to handle the illumination/pose variations and to capture appearance details that describe expressions with weak emotion intensities. Rather than directly concatenating the complementary features, we propose a novel cross-modality attention fusion network to enhance spatial correlations between any two types of facial images. Finally, the CM-CNN is optimized with an improved focal loss, which pays more attention to facial expressions with weak emotion intensities. The average classification accuracies on VT-KFER, BU-3DFE(P1), BU-3DFE(P2), and Bosphorus are 93.86%, 88.91%, 87.28%, and 85.16%, respectively. Evaluations on these databases demonstrate that our approach is competitive to state-of-the-art algorithms.
Translated text
Key words
Convolutional neural network (CNN),cross-modality attention fusion,facial depth images,facial expression recognition (FER),focal loss
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined