PBCCF: Accelerated Deduplication by Prefetching Backup Content Correlated Fingerprints

2020 IEEE 38th International Conference on Computer Design (ICCD)(2020)

引用 3|浏览10
暂无评分
摘要
Deduplication provides significant benefits for accelerating large-scale storage systems, particularly backup systems, by eliminating the redundancy of the streaming data. Given the extraordinary growth of data, modern deduplication backup systems are challenged with the task of effectively and efficiently identifying data duplicates while having limited memory for fingerprint indexing. Based on our observation about an enterprise backup system, for the newly created client, there are no historical backups so that the prefetching algorithm has no reference basis to perform effective fingerprint prefetching. The generic prefetching approach such as Progressive Sampling requires large memory to maintain the prefetching performance. In our paper, we discovered the backup content correlation exists among the backups from some different clients based on the study of the real-world dataset. We propose a fingerprint prefetching algorithm, prefetching backup content correlated fingerprint (PBCCF) to improve the prefetching performance, by applying lightweight machine learning and statistical techniques to discover the backup patterns and generalize their features only using the high-level meta data. The experimental results reveal that PBCCF succeeds at identifying the highly correlated backups and fingerprints to maintain a good deduplication rate while significantly saving memory compared to the Progressive Sampling.
更多
查看译文
关键词
Deduplication,Backup systems,Fingerprint prefetching,Partial indexing,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要