Towards Cluster-wide Deduplication Based on Ceph

2019 IEEE International Conference on Networking, Architecture and Storage (NAS)(2019)

引用 3|浏览97
暂无评分
摘要
In this paper, we design an efficient deduplication algorithm based on the distributed storage architecture of Ceph. The algorithm uses on-line block-level data deduplication technology to complete data slicing, which neither affects the data storage process in Ceph nor alter other interfaces and functions in Ceph. Without relying on any central node, the algorithm maintains the characteristics of Ceph by designing a special hash object to store the data fingerprint, and uses the CRUSH algorithm to judge the data duplication based on calculation, instead of global search. The algorithm replaces the duplicate data with the deduplicated objects, which storage their fingerprints with less storage space. We compare the effects of different block sizes with respect to the performance and deduplication rates through experimental studies, and select the most appropriate block size in our prototype implementation. The experimental results show that the algorithm can not only effectively save the storage space but also improve the bandwidth utilization when reading and writing the duplicate data.
更多
查看译文
关键词
deduplication,distributed storage system,Ceph
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要