Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores

VLDB 2020, 2020.

Cited by: 0|Bibtex|Views73
Weibo:
We have presented Delta Lake, an ACID table storage layer over cloud object stores that enables a wide range of DBMS-like performance and management features for data in low-cost cloud storage

Abstract:

Cloud object stores such as Amazon S3 are some of the largest and most cost-effective storage systems on the planet, making them an attractive target to store large data warehouses and data lakes. Unfortunately, their implementation as key-value stores makes it difficult to achieve ACID transactions and high performance: metadata operatio...More

Code:

Data:

0
Introduction
  • Cloud object stores such as Amazon S3 [4] and Azure Blob Storage [17] have become some of the largest and most widely used storage systems on the planet, holding exabytes of data for millions of customers [46].
  • Many systems support reading and writing to cloud object stores, achieving performant and mutable table storage over these systems is challenging, making it difficult to implement data warehousing capabilities over them
  • Unlike distributed filesystems such as HDFS [5], or custom storage engines in a DBMS, most cloud object stores are merely key-value stores, with no crosskey consistency guarantees.
  • Their performance characteristics differ greatly from distributed filesystems and require special care
Highlights
  • Cloud object stores such as Amazon S3 [4] and Azure Blob Storage [17] have become some of the largest and most widely used storage systems on the planet, holding exabytes of data for millions of customers [46]
  • Apart from the traditional advantages of clouds services, such as pay-as-you-go billing, economies of scale, and expert management [15], cloud object stores are especially attractive because they allow users to scale computing and storage resources separately: for example, a user can store a petabyte of data but only run a cluster to execute a query over it for a few hours
  • We found that many users could avoid running a separate message bus system altogether and use a low-cost cloud object store with Delta to implement streaming pipelines with latency on the order of seconds
  • Our experience with Delta Lake shows that ACID transactions can be implemented over cloud object stores for many enterprise data processing workloads, and that they can support large-scale streaming, batch and interactive workloads
  • We have presented Delta Lake, an ACID table storage layer over cloud object stores that enables a wide range of DBMS-like performance and management features for data in low-cost cloud storage
  • Delta Lake is used at thousands of organizations to processes exabytes of data per day, oftentimes replacing more complex architectures that involved multiple data management systems
Results
  • AWS i3 instances offer 237 GB of NVMe SSD storage per core at roughly 50% higher cost than the corresponding m5 instances.
Conclusion
  • The authors' experience with Delta Lake shows that ACID transactions can be implemented over cloud object stores for many enterprise data processing workloads, and that they can support large-scale streaming, batch and interactive workloads.
  • Delta’s ACID transactions allow them to update such indexes transactionally with changes to the base data.The authors have presented Delta Lake, an ACID table storage layer over cloud object stores that enables a wide range of DBMS-like performance and management features for data in low-cost cloud storage.
  • It is open source under an Apache 2 license at https://delta.io
Summary
  • Introduction:

    Cloud object stores such as Amazon S3 [4] and Azure Blob Storage [17] have become some of the largest and most widely used storage systems on the planet, holding exabytes of data for millions of customers [46].
  • Many systems support reading and writing to cloud object stores, achieving performant and mutable table storage over these systems is challenging, making it difficult to implement data warehousing capabilities over them
  • Unlike distributed filesystems such as HDFS [5], or custom storage engines in a DBMS, most cloud object stores are merely key-value stores, with no crosskey consistency guarantees.
  • Their performance characteristics differ greatly from distributed filesystems and require special care
  • Results:

    AWS i3 instances offer 237 GB of NVMe SSD storage per core at roughly 50% higher cost than the corresponding m5 instances.
  • Conclusion:

    The authors' experience with Delta Lake shows that ACID transactions can be implemented over cloud object stores for many enterprise data processing workloads, and that they can support large-scale streaming, batch and interactive workloads.
  • Delta’s ACID transactions allow them to update such indexes transactionally with changes to the base data.The authors have presented Delta Lake, an ACID table storage layer over cloud object stores that enables a wide range of DBMS-like performance and management features for data in low-cost cloud storage.
  • It is open source under an Apache 2 license at https://delta.io
Related work
  • Multiple research and industry projects have sought to adapt data management systems to a cloud environment. For example, Brantner et al explored building an OLTP database system over S3 [20]; bolt-on consistency [19] implements causal consistency on top of eventually consistent key-value stores; AWS Aurora [49] is a commercial OLTP DBMS with separately scaling compute and storage layers; and Google BigQuery [29], AWS Redshift Spectrum [39] and Snowflake [23] are OLAP DBMSes that can scale computing clusters separately from storage and can read data from cloud object stores. Other work, such as the Relational Cloud project [22], considers how to automatically adapt DBMS engines to elastic, multi-tenant workloads.

    Delta Lake shares these works’ vision of leveraging widely available cloud infrastructure, but targets a different set of requirements. Specifically, most previous DBMS-on-cloud-storage systems require the DBMS to mediate interactions between clients and storage (e.g., by having clients connect to an Aurora or Redshift frontend server). This creates an additional operational burden (frontend nodes have to always be running), as well as possible scalability, availability or cost issues when streaming large amounts of data through the frontend nodes. In contrast, we designed Delta Lake so that many, independently running clients could coordinate access to a table directly through cloud object store operations, without a separately running service in most cases (except for a lightweight coordinator for log record IDs on S3, as described in §3.2.2). This design makes Delta Lake operationally simple for users and ensures highly scalable reads and writes at the same cost as the underlying object store. Moreover, the system is as highly available as the underlying cloud object store: no other components need to be hardened or restarted for disaster recovery. Of course, this design is feasible here due to the nature of the workload that Delta Lake targets: an OLAP workload with relatively few write transactions per second, but large transaction sizes, which works well with our optimistic concurrency approach.
Funding
  • It is also supported by Google Cloud, Alibaba, Tencent, Fivetran, Informatica, Qlik, Talend, and other products [50, 26, 33]
Reference
  • Amazon Athena. https://aws.amazon.com/athena/.
    Findings
  • Amazon Kinesis. https://aws.amazon.com/kinesis/.
    Findings
  • Amazon Redshift. https://aws.amazon.com/redshift/.
    Findings
  • Amazon S3. https://aws.amazon.com/s3/.
    Findings
  • Apache Hadoop. https://hadoop.apache.org.
    Findings
  • Apache Kudu. https://kudu.apache.org.
    Findings
  • Apache HBase. https://hbase.apache.org.
    Findings
  • Apache Hudi. https://hudi.apache.org.
    Findings
  • Apache Hudi GitHub issue: Future support for multi-client concurrent write? https://github.com/apache/incubator-hudi/issues/1240.
    Findings
  • Apache Iceberg. https://iceberg.apache.org.
    Findings
  • Apache Kafka. https://kafka.apache.org.
    Findings
  • Apache ORC. https://orc.apache.org.
    Findings
  • Apache Parquet. https://parquet.apache.org.
    Findings
  • M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, and M. Zaharia. Structured streaming: A declarative API for real-time applications in Apache Spark. In SIGMOD, page 601–613, New York, NY, USA, 2018. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A view of cloud computing. Communications of the ACM, 53:50–58, 04 2010.
    Google ScholarLocate open access versionFindings
  • M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015.
    Google ScholarLocate open access versionFindings
  • Azure Blob Storage. https://https://azure.microsoft.com/enus/services/storage/blobs/.
    Findings
  • Azure Data Lake Storage. https://azure.microsoft.com/enus/services/storage/data-lake-storage/.
    Findings
  • P. Bailis, A. Ghodsi, J. Hellerstein, and I. Stoica. Bolt-on causal consistency. pages 761–772, 06 2013.
    Google ScholarFindings
  • M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. pages 251–264, 01 2008.
    Google ScholarFindings
  • A. Conway and J. Minnick. Introducing Delta Engine. https://databricks.com/blog/2020/06/24/introducing-delta-engine.html.
    Findings
  • C. Curino, E. Jones, R. Popa, N. Malviya, E. Wu, S. Madden, H. Balakrishnan, and N. Zeldovich. Relational cloud: A database-as-a-service for the cloud. In CIDR, pages 235–240, 04 2011.
    Google ScholarLocate open access versionFindings
  • B. Dageville, J. Huang, A. Lee, A. Motivala, A. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, P. Unterbrunner, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, and M. Hentschel. The Snowflake elastic data warehouse. pages 215–226, 06 2016.
    Google ScholarFindings
  • P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and. G. P. A. Group. The variant call format and VCFtools. Bioinformatics, 27(15):2156–2158, 06 2011.
    Google ScholarLocate open access versionFindings
  • Databricks runtime. https://databricks.com/product/databricks-runtime.
    Findings
  • Delta Lake website. https://delta.io.
    Findings
  • General Data Protection Regulation. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46. Official Journal of the European Union, 59:1–88, 2016.
    Google ScholarLocate open access versionFindings
  • Glow: An open-source toolkit for large-scale genomic analysis. https://projectglow.io.
    Findings
  • Google BigQuery. https://cloud.google.com/bigquery.
    Findings
  • Google Cloud Storage. https://cloud.google.com/storage.
    Findings
  • Google Cloud Storage consistency documentation. https://cloud.google.com/storage/docs/consistency.
    Findings
  • Hive 3 ACID documentation from Cloudera. https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_3_internals.html.
    Findings
  • H. Jaani. New data ingestion network for Databricks: The partner ecosystem for applications, database, and big data integrations into Delta Lake. https://databricks.com/blog/2020/02/24/newdatabricks-data-ingestion-network-forapplications-database-and-big-dataintegrations-into-delta-lake.html, 2020.
    Findings
  • H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16):2078–2079, Aug. 2009.
    Google ScholarLocate open access versionFindings
  • G. M. Morton. A computer oriented geodetic data base; and a new technique in file sequencing. IBM Technical Report, 1966.
    Google ScholarLocate open access versionFindings
  • S. Naik and B. Gummalla. Small files, big foils: Addressing the associated metadata and application challenges. https://blog.cloudera.com/small-files-bigfoils-addressing-the-associated-metadata-andapplication-challenges/, 2019.
    Findings
  • F. A. Nothaft, M. Massie, T. Danford, Z. Zhang, U. Laserson, C. Yeksigian, J. Kottalam, A. Ahuja, J. Hammerbacher, M. Linderman, and et al. Rethinking data-intensive science using scalable analytics systems. In SIGMOD, page 631–646, New York, NY, USA, 2015. ACM.
    Google ScholarFindings
  • OpenStack Swift. https://www.openstack.org/software/releases/train/components/swift.
    Findings
  • Querying external data using Amazon Redshift Spectrum. https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html.
    Findings
  • S3 consistency documentation. https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel.
    Findings
  • S3 ListObjectsV2 API. https://docs.aws.amazon.com/ AmazonS3/latest/API/API_ListObjectsV2.html.
    Findings
  • R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner. Presto: SQL on everything. In ICDE, pages 1802–1813, April 2019.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O’Neil, P. O’Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A column-oriented dbms. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB ’05, page 553–564. VLDB Endowment, 2005.
    Google ScholarLocate open access versionFindings
  • C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, J. Danesh, P. Downey, P. Elliott, J. Green, M. Landray, B. Liu, P. Matthews, G. Ong, J. Pell, A. Silman, A. Young, T. Sprosen, T. Peakman, and R. Collins. UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine, 12(3):1–10, 03 2015.
    Google ScholarLocate open access versionFindings
  • A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, pages 996–1005. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • M.-L. Tomsen Bukovec. AWS re:Invent 2018. Building for durability in Amazon S3 and Glacier. https://www.youtube.com/watch?v=nLyppihvhpQ, 2018.
    Findings
  • Transaction Processing Performance Council. TPC benchmark DS standard specification version 2.11.0, 2019.
    Google ScholarFindings
  • Understanding block blobs, append blobs, and page blobs. https://docs.microsoft.com/enus/rest/api/storageservices/understandingblock-blobs--append-blobs--and-page-blobs.
    Findings
  • A. Verbitski, X. Bao, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, and T. Kharatishvili. Amazon Aurora: Design considerations for high throughput cloud-native relational databases. In SIGMOD, pages 1041–1052, 05 2017.
    Google ScholarLocate open access versionFindings
  • R. Yao and C. Crosbie. Getting started with new table formats on Dataproc. https://cloud.google.com/blog/products/dataanalytics/getting-started-with-new-tableformats-on-dataproc.
    Findings
  • M. Zaharia, A. Chen, A. Davidson, A. Ghodsi, S. A. Hong, A. Konwinski, S. Murching, T. Nykodym, P. Ogilvie, M. Parkhe, F. Xie, and C. Zumar. Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull., 41:39–45, 2018.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In NSDI, pages 15–28, 2012.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments