Towards an Empirical Evaluation of Scientific Data Indexing and Querying.

J. Inf. Data Manag.(2018)

引用 1|浏览39
暂无评分
摘要
Computational simulations usually produce large amounts of data on a regular time-step basis. Heterogeneous simulation outputs are stored in different file formats and on distinct storage devices. Therefore, the main challenges for accessing simulation data are related to time-to-query, which is the effort spent for setting all data into a common framework, the issuing of a high-level query statement, and obtaining the result set. The simulation data loading into DataBase Management Systems (DBMS) are either unpractical, as they demand a prohibitive time for data preparation, or unfeasible, as data files are still needed in their original form (scientific applications still need to read and write contents to those files). In this article, we discuss the complementary approaches of adaptive querying and raw data file indexing for accessing simulation results stored in multiple sources (e.g., raw data files) without data loading. In particular, we review (i) NoDB PostgresRAW routines for adaptive query processing, and (ii) FastBit methods for raw data file indexing and querying. We examine the behavior of both strategies regarding a real case study of computational fluid dynamics simulation in the domain of sediment deposition. In this experimental evaluation, we measured the elapsed time for index construction and query processing regarding six distinct query categories over 62 time steps, which sums up to different 372 queries on 44,160 files (12.2 GB) produced by the computational simulation. Results show that FastBit is faster than PostgresRAW for query execution in all but low-selectivity query scenarios. In a complementary manner, results also show PostgresRAW outperforms FastBit whenever users are interested in reducing time-to-query rather than the query execution time itself.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要