Application-Specific Delta-Encoding Via Resemblance Detection

USENIX ASSOCIATION PROCEEDINGS OF THE GENERAL TRACK(2003)

引用 303|浏览296
暂无评分
摘要
Many objects, such as files, electronic messages, and web pages, contain overlapping content. Numerous past research projects have observed that one can compress one object relative to another one by computing the differences between the two, but these delta-encoding systems have almost invariably required knowledge of a specific relationship between them-most commonly, two versions using the same name at different points in time. We consider cases in which this relationship is determined dynamically, by efficiently determining when a sufficient resemblance exists between two objects in a relatively large collection. We look at specific examples of this technique, namely web pages, email, and files in a file system, and evaluate the potential data reduction and the factors that influence this reduction. We find that delta-encoding using this resemblance detection technique can improve on simple compression by up to a factor of two, depending on workload, and that a small fraction of objects can potentially account for a large portion of these savings.
更多
查看译文
关键词
web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要