V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection
arxiv(2024)
摘要
AI-generated video has revolutionized short video production, filmmaking, and
personalized media, making video local editing an essential tool. However, this
progress also blurs the line between reality and fiction, posing challenges in
multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to
address the limitations of current video tampering forensics, such as poor
generalizability, singular function, and single modality focus. Combining the
fragility of video-into-video steganography with deep robust watermarking, our
method can embed invisible visual-audio localization watermarks and copyright
watermarks into the original video frames and audio, enabling precise
manipulation localization and copyright protection. We also design a temporal
alignment and fusion module and degradation prompt learning to enhance the
localization accuracy and decoding robustness. Meanwhile, we introduce a
sample-level audio localization method and a cross-modal copyright extraction
mechanism to couple the information of audio and video frames. The
effectiveness of V2A-Mark has been verified on a visual-audio tampering
dataset, emphasizing its superiority in localization precision and copyright
accuracy, crucial for the sustainable development of video editing in the AIGC
video era.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要