EMVGAN: Emotion-Aware Music-Video Common Representation Learning Via Generative Adversarial Networks

Yu-Chih Tsai,Tse-Yu Pan,Ting-Yang Kao,Yi-Hsuan Yang,Min-Chun Hu

Proceedings of the 2022 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia（2022）

引用 1|浏览5

暂无评分

摘要

Music can enhance our emotional reactions to videos and images, while videos and images can enrich our emotional response to music. Cross-modality retrieval technology can be used to recommend appropriate music for a given video and vice versa. However, the heterogeneity gap caused by the inconsistent distribution between different data modalities complicates learning the common representation space from different modalities. Accordingly, we propose an emotion-aware music-video cross-modal generative adversarial network (EMVGAN) model to build an affective common embedding space to bridge the heterogeneity gap among different data modalities. The evaluation results revealed that the proposed EMVGAN model can learn affective common representations with convincing performance while outperforming other existing models. Furthermore, the satisfactory performance of the proposed network encouraged us to undertake the music-video bidirectional retrieval task.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要