Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops(2017)

引用 4|浏览153
暂无评分
摘要
This paper(1) is part of a larger effort to detect manipulations of video by searching for and combining the evidence of multiple types of inconsistencies between the audio and visual channels. Here, we focus on inconsistencies between the type of scenes detected in the audio and visual modalities (e.g., audio indoor, small room versus visual outdoor, urban), and inconsistencies in speaker identity tracking over a video given audio speaker features and visual face features (e.g., a voice change, but no talking face change). The scene inconsistency task was complicated by mismatches in the categories used in current visual scene and audio scene collections. To deal with this, we employed a novel semantic mapping method. The speaker identity inconsistency process was challenged by the complexity of comparing face tracks and audio speech clusters, requiring a novel method of fusing these two sources. Our progress on both tasks was demonstrated on two collections of tampered videos.
更多
查看译文
关键词
spotting audio-visual inconsistencies,SAVI,manipulated video,audio channel,visual channel,audio modality,visual modality,scene inconsistency task,audio speaker features,visual face features,semantic mapping method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要