The Design of a Lossless Deduplication Scheme to Eliminate Fine-Grained Redundancy for JPEG Image Storage Systems.

IEEE Trans. Computers(2024)

Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image deduplication cannot eliminate fine-grained redundancy nor guarantee lossless results. In this work, we propose imDedup, a lossless and fine-grained deduplication scheme for JPEG image storage systems. Specifically, imDedup uses a novel sampling hash method, Feature Bitmap, to detect similar images in a fast way by utilizing the information distribution of JPEG data. Meanwhile, it uses Idelta, a novel delta encoder that incorporates image compression into deduplication, to guarantee the non-redundant data can be re-compressed via image encoding and thus improves the compression ratio. Besides, we propose the DCHash and Fixed-Point Matching (FPM) techniques to further speed up Idelta. We also propose imDedup-plus, which dynamically chooses the DCHash-based or FPM-based compressor to achieve higher throughputs without sacrificing the compression ratio. Experimental results demonstrate the superiority of the imDedup-based methods on five datasets. Compared with the state-of-the-art similarity detector and delta encoder, imDedup achieves 1.8–4.4× higher throughputs and 1.3–1.7× higher compression ratios, respectively. Besides, imDedup-plus can further achieve 1.3–2.9× higher throughputs than imDedup without sacrificing the compression ratio.
image deduplication,fine-grained deduplication,delta compression,JPEG compression,storage systems
