On the consensus of synchronous temporal and spatial views: A novel multimodal deep learning method for social video prediction

Shuaiyong Xiao, Jianxiong Wang, Jiwei Wang, Runlin Chen,Gang Chen

INFORMATION PROCESSING & MANAGEMENT（2024）

引用 0|浏览18

暂无评分

摘要

The blowout development of video social platforms has spawned a wide range of social video prediction (SVP) tasks, such as video attractiveness prediction and video sentiment classification. In this paper, we propose to enhance SVP by making synchronous predictions based on temporal and spatial data perspectives and reconciling them to form a consistent predictive view. To this end, we develop a novel multimodal deep learning method named MATSC (modality-awareness and temporal-spatial-consistency-based neural network). Specifically, MATSC first constructs the temporal predictive view by capturing valuable fine-grained data patterns and generating diverse multimodal representations via the modality-awareness learning strategy. Then, MATSC constructs the spatial predictive view by exploiting diverse modality-wise interactive patterns in finegrained video clips. Third, MATSC reconciles the heterogeneous temporal and spatial predictive capabilities via a temporal-spatial-consistency learning objective. Empirical results based on three SVP datasets show the outperformance of MATSC over state-of-the-art benchmarks, demonstrating the enhancement effect of synergizing temporal and spatial data views for SVP tasks.

查看译文

关键词

Cross-modal-attention-based GCN,Modality-awareness learning,Social video prediction,Synchronous predictive views,Temporal-spatial consistency learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要