谷歌浏览器插件
订阅小程序
在清言上使用

LatentColorization: Latent Diffusion-Based Speaker Video Colorization

IEEE ACCESS(2024)

引用 0|浏览15
暂无评分
摘要
While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Many existing video colorization techniques operate frame-by-frame, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we combine the generative capabilities of a fine-tuned latent diffusion model with an autoregressive conditioning mechanism to ensure temporal consistency in automatic speaker video colorization. We demonstrate strong improvements on established quality metrics compared to existing methods, namely, PSNR, SSIM, FID, FVD, NIQE and BRISQUE. Specifically, we achieve an 18% improvement in performance when FVD is employed as the evaluation metric. Furthermore, we performed a subjective study, where users preferred LatentColorization to the existing state-of-the-art DeOldify 80% of the time. Our dataset combines conventional datasets and videos from television/movies. A short demonstration of our results can be seen in some example videos available at https://youtu.be/vDbzsZdFuxM.
更多
查看译文
关键词
Streaming media,Training,Image color analysis,Generative adversarial networks,Diffusion processes,Benchmark testing,Task analysis,Artificial intelligence,Artificial neural networks,Computer vision,artificial neural networks,machine learning,computer vision,video colorization,latent diffusion,image colorization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要