The Design of a Lossless Deduplication Scheme to Eliminate Fine-Grained Redundancy for JPEG Image Storage Systems.

IEEE Trans. Computers(2024)

引用 0|浏览6
Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image deduplication cannot eliminate fine-grained redundancy nor guarantee lossless results. In this work, we propose imDedup, a lossless and fine-grained deduplication scheme for JPEG image storage systems. Specifically, imDedup uses a novel sampling hash method, Feature Bitmap, to detect similar images in a fast way by utilizing the information distribution of JPEG data. Meanwhile, it uses Idelta, a novel delta encoder that incorporates image compression into deduplication, to guarantee the non-redundant data can be re-compressed via image encoding and thus improves the compression ratio. Besides, we propose the DCHash and Fixed-Point Matching (FPM) techniques to further speed up Idelta. We also propose imDedup-plus, which dynamically chooses the DCHash-based or FPM-based compressor to achieve higher throughputs without sacrificing the compression ratio. Experimental results demonstrate the superiority of the imDedup-based methods on five datasets. Compared with the state-of-the-art similarity detector and delta encoder, imDedup achieves 1.8–4.4× higher throughputs and 1.3–1.7× higher compression ratios, respectively. Besides, imDedup-plus can further achieve 1.3–2.9× higher throughputs than imDedup without sacrificing the compression ratio.
image deduplication,fine-grained deduplication,delta compression,JPEG compression,storage systems
AI 理解论文
Chat Paper