Towards Cluster-wide Deduplication Based on Ceph

Jinpeng Wang,Yang Wang,Hekang Wang,Kejiang Ye,Chengzhong Xu,Shuibing He,Lingfang Zeng

2019 IEEE International Conference on Networking, Architecture and Storage (NAS)（2019）

引用 3|浏览97

暂无评分

摘要

In this paper, we design an efficient deduplication algorithm based on the distributed storage architecture of Ceph. The algorithm uses on-line block-level data deduplication technology to complete data slicing, which neither affects the data storage process in Ceph nor alter other interfaces and functions in Ceph. Without relying on any central node, the algorithm maintains the characteristics of Ceph by designing a special hash object to store the data fingerprint, and uses the CRUSH algorithm to judge the data duplication based on calculation, instead of global search. The algorithm replaces the duplicate data with the deduplicated objects, which storage their fingerprints with less storage space. We compare the effects of different block sizes with respect to the performance and deduplication rates through experimental studies, and select the most appropriate block size in our prototype implementation. The experimental results show that the algorithm can not only effectively save the storage space but also improve the bandwidth utilization when reading and writing the duplicate data.

查看译文

关键词

deduplication,distributed storage system,Ceph

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要