Multi-level Latent Fusion in Learning-based Image Coding.


引用 0|浏览7
Learning-based image coding has shown promising results for coding of natural images compared to traditional blockbased coding schemes. However, improvements are needed for screen content coding. Most of the popular learning-based coding approaches are based on variational autoencoders employing Convolutional Neural Networks (CNNs) which are end-to-end trained on a training dataset. The receptive field area of the latents in these architectures increase based on the down-sampling ratio and the kernel size used in each convolution layer. The latents coded from the last layer therefore have a large receptive field size which may not be optimal to code image sources such as screen content or mixed content containing text, logos and small edges. This paper proposes new methods to adaptively fuse and code the latents from different layers. It enables a novel multi-level receptive field based latent coding architecture to achieve better coding performance for a diverse set of contents. Additionally, Multi-Mixture distribution based entropy modeling of latent features and content adaptive latent refinements in the encoder is proposed to bring more coding gains. The experimental results show that the approach can significantly improve the coding efficiency for screen content with average bitrate savings of 36%.
Deep learning,neural network,feature pyramid network,fusion,image compression,screen content coding
AI 理解论文
Chat Paper