Bpar: A Bundle-Based Parallel Aggregation Framework For Decoupled I/O Execution

SC(2014)

引用 6|浏览26
暂无评分
摘要
In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techniques focus on optimizing the access pattern on a single file or file extent, few of these techniques consider cross-file I/O optimizations. This paper aims to explore the potential benefit from cross-file I/O aggregation. We propose a Bundle-based PARallel Aggregation framework ( BPAR) and design three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications. The results of our experiments reveal that BPAR can achieve on average 2.1x performance improvement over the baseline GEOS-5.
更多
查看译文
关键词
Big Data,input-output programs,BPAR,Big Data,HDF5,I/O performance,I/O techniques,I/O throughput,MPI-IO,access pattern,baseline GEOS-5,bundle-based parallel aggregation framework,cross-file I/O aggregation,cross-file I/O optimizations,data sieving,decoupled I/O execution,file extent,mission-critical application GEOS-5,parallel NetCDF,partitioning schemes,scientific applications,shared datasets,single file,two-phase I/O,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要