LatentColorization: Latent Diffusion-Based Speaker Video Colorization

Rory Ward,Dan Bigioi,Shubhajit Basak,John G. Breslin,Peter Corcoran

IEEE ACCESS（2024）

引用 0|浏览15

暂无评分

摘要

While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Many existing video colorization techniques operate frame-by-frame, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we combine the generative capabilities of a fine-tuned latent diffusion model with an autoregressive conditioning mechanism to ensure temporal consistency in automatic speaker video colorization. We demonstrate strong improvements on established quality metrics compared to existing methods, namely, PSNR, SSIM, FID, FVD, NIQE and BRISQUE. Specifically, we achieve an 18% improvement in performance when FVD is employed as the evaluation metric. Furthermore, we performed a subjective study, where users preferred LatentColorization to the existing state-of-the-art DeOldify 80% of the time. Our dataset combines conventional datasets and videos from television/movies. A short demonstration of our results can be seen in some example videos available at https://youtu.be/vDbzsZdFuxM.

查看译文

关键词

Streaming media,Training,Image color analysis,Generative adversarial networks,Diffusion processes,Benchmark testing,Task analysis,Artificial intelligence,Artificial neural networks,Computer vision,artificial neural networks,machine learning,computer vision,video colorization,latent diffusion,image colorization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要