IndexIt - Enhancing Data Locating Services for Parallel File Systems.

HPCC/SmartCity/DSS(2019)

引用 1|浏览49
暂无评分
摘要
While the ability to access a small fraction of data records from a large volume of scientific datasets is vital to accelerate scientific discovery, existing parallel file systems face serious challenges in managing scientific big data since data services have traditionally been decoupled from file systems. In this paper, we present IndexIt, an in-situ index and query middleware that aims to enhance record locating services for parallel file systems. When applications are writing data to parallel file systems, IndexIt allows users to index data while they are still in memory and bypass the performance bottleneck between memory and disks. By applying lightweight Bitmap-Range index and organizing the index as key-value pairs, IndexIt accelerates index building efficiently. Moreover, we propose a two-level query processing framework to process query requests. We build a prototype and evaluate its performance with real scientific datasets. Compared with existing data management tools, the index building time can be reduced dramatically. The proposed two-level query processing achieves up to 3 orders of magnitude speed-up than scanning the entire dataset and shows up to 1.35x speed-up than an existing tool.
更多
查看译文
关键词
High Performance Computing,Big Data,Data Management,Index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要