Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications

IPDPS(2013)

引用 65|浏览49
暂无评分
摘要
Compute cycles in high performance systems are increasing at a much faster pace than both storage and wide area bandwidths. To continue improving the performance of large-scale data analytics applications, compression has therefore become promising approach. In this context, this paper makes the following contributions. First, we develop a new compression methodology, which exploits the similarities between spatial and/or temporal neighbors in a popular climate simulation dataset and enables high compression ratios and low decompression costs. Second, we develop a framework that can be used to incorporate a variety of compression and decompression algorithms. This framework also supports a simple API to allow integration with an existing application or data processing middleware. Once a compression algorithm is implemented, this framework automatically mechanizes multi-threaded retrieval, multi-threaded data decompression, and the use of informed prefetching and caching. By integrating this framework with a data-intensive middleware, we have applied our compression methodology and framework to three applications over two datasets, including the Global Cloud-Resolving Model (GCRM) climate dataset. We obtained an average compression ratio of 51.68%, and up to 53.27% improvement in execution time of data analysis applications by amortizing I/O time by moving compressed data.
更多
查看译文
关键词
compression algorithm,new compression methodology,compression methodology,integrating online compression,average compression ratio,decompression algorithm,high compression ratio,multi-threaded data decompression,data analysis application,accelerate large-scale data analytics,large-scale data,low decompression cost,compression,information retrieval,computational modeling,climatology,middleware,compression algorithms,hpc,api,i o,multi threading,data models,data compression,data analysis,data management,compression ratios,meteorology,data intensive computing,big data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要