Predicting Content Similarity via Multimodal Modeling for Video-In-Video Advertising

IEEE Transactions on Circuits and Systems for Video Technology（2021）

引用 10|浏览251

暂无评分

摘要

Rapid development of mobile devices has led to explosive growth of videos and online platforms, which creates great demand for online advertising in videos. Existing advertising methods often aim to randomly select a time point as insertion position, which means that the video content is likely not related to the ad content, resulting in unsatisfactory user experience. While previous works have neglected to understand rich semantics as well as multimodal information in video advertising, in contrast to previous works, we present an innovative method for video-in-video advertising using multimodal modeling. First, different pre-trained models are used to extract multimodal representations. Then, through multimodal modeling, we learn the complementarity among different representations and obtain a unified video-level description. Finally, the unified representations of ads and videos are utilized to find the best matching result for each advertisement. Our method emphasizes the content similarity between ad and video, which would make the transition between video and ad more natural. Comprehensive experiments with both objective and subjective evaluations demonstrate the effectiveness and user-friendliness of our proposed video-in-video advertising framework.

查看译文

关键词

Video-in-video advertising,multimodal modeling,unified representation,content similarity

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要