A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks.

IEEE BigData(2021)

引用 6|浏览4
暂无评分
摘要
Data Lakehouse is a new paradigm in data architectures that embodies and integrates already established concepts for the systematic management of disparate, large-scale data - a data lake for heterogeneous data management, use of open standards for high-performance querying, and systematic maintenance of the data "freshness". In addition to being a new concept, the data lakehouse is also still a conceptual construct. Many projects that use the lakehouse require maturing, empirical studies, and specific implementations. In this paper, we present our implementation of the data lakehouse concept in a biomedical research and health data analytics domain, and we discuss the implementation of some unique and novel features such as support for specialized access controls in support of HIPAA regulation and IRB protocols, and support for the FAIR standard.
更多
查看译文
关键词
biomedical research,data architectures,large-scale data,heterogeneous data management,systematic maintenance,data freshness,data lakehouse concept,health data analytics domain,lakehouse architecture,mega-biobanks,high-performance querying,HIPAA regulation,IRB protocols,FAIR standard
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要