Excavating multimodal correlation for representation learning

INFORMATION FUSION(2023)

引用 5|浏览74
暂无评分
摘要
A majority of previous methods for multimodal representation learning ignore the rich correlation information inherently stored in each sample, leading to a lack of robustness when trained on small datasets. Although a few contrastive learning frameworks leverage that information in a self-supervised manner, they generally encourage the intra-sample unimodal representations to be identical, neglecting the modality-specific information carried by individual modalities. In contrast, we propose a novel algorithm that learns the correlations between modalities to facilitate downstream multimodal tasks by leveraging the prior information across samples, and we explore the feasibility of the proposed method on elaborately designed unsupervised and supervised auxiliary learning tasks. Specifically, we construct the positive and negative sets for correlation learning as unimodal embeddings from the same sample and from different samples, respectively. A weak predictor is employed on the concatenated unimodal embeddings to learn the correspondence relationship for each set. In this way, the model can correlate unimodal features and discover the shared information across modalities. In contrast to contrastive learning methods, the proposed framework is compatible with any number of modalities and can retain modality-specific information, enabling multimodal representation to capture richer information. Moreover, in the supervised version, one of the main novelties is that the sample labels are further utilized to learn more discriminative features, where the assigned correlation scores of negative sets vary according to the label variations between the associated samples. Extensive experiments suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis, emotion recognition, and humor detection, and can improve the performance of various fusion approaches.
更多
查看译文
关键词
Multimodal sentiment analysis,Multimodal representation learning,Correlation learning,Multimodal emotion recognition,Multimodal humor detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要