A Focused Garbage Collection Approach for Primary Deduplicated Storage with Low Memory Overhead

2022 IEEE 40th International Conference on Computer Design (ICCD)(2022)

引用 0|浏览24
暂无评分
摘要
Since one chunk could be shared by many files after data deduplication, Garbage Collection (GC) is an essential but complex task to reclaim stale chunks in large-scale primary deduplication systems. Traditional Mark&Sweep is a widely used approach but suffers from the increasingly traversing time and huge memory overhead of Liveness Array (i.e., a data structure reflects the liveness of alive chunks) in the Mark phase. This paper proposes a new method named Focused Garbage Collection (FGC) to accelerate the Mark phase for primary deduplication storage significantly. Specifically, we design a global Austere Reference Graph with low memory cost that efficiently represents files’ reference relationships (i.e., sharing chunks after deduplication) by considering the deduplication characteristics of workloads in primary systems. Austere Reference Graph helps FGC focus on the deleted files and their correlative files to quickly mark stale chunks, while traditional approaches need to traverse all files. Consequently, FGC’s traversing time and Liveness Array size will be greatly reduced in the Mark phase. Evaluation results show that compared with traditional Mark&Sweep, FGC decreases the time consumption in the Mark phase 1.3×-7.34× in a stand-alone primary deduplication system and 128×-256× network traffic reduction for the Mark phase while only introducing < 0.05% extra memory overhead for the reference graph.
更多
查看译文
关键词
Deduplication,Garbage Collection,Overhead
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要