Temporal Context Mining for Learned Video Compression

IEEE TRANSACTIONS ON MULTIMEDIA(2023)

Cited 6|Views57
No score
Abstract
Applying deep learning to video compression has attracted increasing attention in recent few years. In this work, we address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts. We propose to propagate not only the last reconstructed frame but also the feature before obtaining the reconstructed frame for temporal context mining. From the propagated feature, we learn multi-scale temporal contexts and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder. We discard the parallelization-unfriendly auto-regressive entropy model to pursue a more practical encoding and decoding time. Experimental results show that our proposed scheme achieves a higher compression ratio than the existing learned video codecs. Our scheme also outperforms x264 and x265 (representing industrial software for H.264 and H.265, respectively) as well as the official reference software for H.264, H.265, and H.266 (JM, HM, and VTM, respectively). Specifically, when intra period is 32 and oriented to PSNR, our scheme outperforms H.265-HM by 14.4% bit rate saving; when oriented to MS-SSIM, our scheme outperforms H.266-VTM by 21.1% bit rate saving.
More
Translated text
Key words
Video compression,Encoding,Video codecs,Entropy,Decoding,Image coding,Software,Deep neural network,end-to-end compression,learned video compression,temporal context mining,temporal context re-filling
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined