Vread: Efficient Data Access For Hadoop In Virtualized Clouds

MIDDLEWARE(2015)

引用 4|浏览85
暂无评分
摘要
With its unlimited scalability and on-demand access to computation and storage, a virtualized cloud platform is the perfect match for big data systems such as Hadoop. However, virtualization introduces a significant amount of overhead to I/O intensive applications due to device virtualization and VMs or I/O threads scheduling delay. In particular, device virtualization causes significant CPU overhead as I/O data needs to be moved across several protection boundaries. We observe that such overhead especially affects the I/O performance of the Hadoop distributed file system (HDFS). In fact, data read from an HDFS datanode VM must go through virtual devices multiple times incurring non-negligible virtualization overhead even though both client VM and datanode VM may be running on the same machine. In this paper, we propose vRead, a programmable framework which connects I/O flows from HDFS applications directly to their data. vRead enables direct "reads" to the disk images of datanode VMs from the hypervisor. By doing so, vRead can significantly avoid device virtualization overhead, resulting in improved I/O throughput as well as CPU savings for Hadoop workloads and other applications relying on HDFS.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要