Data variety, come as you are in multi-model data warehouses

Sandro Bimonte,Enrico Gallinucci,Patrick Marcel,Stefano Rizzi

Information Systems（2022）

引用 11|浏览34

暂无评分

摘要

Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured, fixed schema. In this paper, we investigate the performances of an MMDBMS when used to store multidimensional data for OLAP analyses. A multi-model DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL, and that of ensuring better flexibility, extensibility, and evolvability thanks to the combined use of structured and schemaless data. To support our investigation we define a multidimensional schema for the UniBench benchmark dataset and an ad-hoc OLAP workload for it. Then we propose and compare three logical solutions implemented on the PostgreSQL multi-model DBMS: one that extends a star schema with JSON, XML, graph-based, and key–value data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced). As expected, the full-relational implementation generally performs better than the multi-model one, but this is balanced by the benefits of MMDBMSs in dealing with variety. Finally, we give our perspective view of the research on this topic.

查看译文

关键词

OLAP,Multi-model databases,Data variety,Data warehouse

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要