AnalyticDB: Real-time OLAP Database System at Alibaba Cloud

Chaoqun Zhan
Chaoqun Zhan
Maomeng Su
Maomeng Su
Chuangxian Wei
Chuangxian Wei
Xiaoqiang Peng
Xiaoqiang Peng
Liang Lin
Liang Lin
Zhe Chen
Zhe Chen
Yue Pan
Yue Pan
Fang Zheng
Fang Zheng

PVLDB, pp. 2059-2070, 2019.

Cited by: 2|Bibtex|Views102|DOI:https://doi.org/10.14778/3352063.3352124
EI
Other Links: dblp.uni-trier.de|dl.acm.org|academic.microsoft.com
Weibo:
To further improve query latency and concurrency, we enhance the optimizer and execution engine in AnalyticDB to fully utilize the advantages of our storage and indexes

Abstract:

With data explosion in scale and variety, OLAP databases play an increasingly important role in serving real-time analysis with low latency (e.g., hundreds of milliseconds), especially when incoming queries are complex and ad hoc in nature. Moreover, these systems are expected to provide high query concurrency and write throughput, and su...More

Code:

Data:

0
Introduction
  • AnalyticDB is an OLAP database system designed for high-concurrency, low-latency, and real-time analytical queries on PB scale, and has been running on 2000+ physical machines on Alibaba Cloud [1].
  • It serves external clients on.
  • Indexing is a straightforward way to improve query performance, building indexes on pre-specified columns is often no longer effective
Highlights
  • AnalyticDB is an OLAP database system designed for high-concurrency, low-latency, and real-time analytical queries on PB scale, and has been running on 2000+ physical machines on Alibaba Cloud [1]
  • To further improve query latency and concurrency, we enhance the optimizer and execution engine in AnalyticDB to fully utilize the advantages of our storage and indexes
  • This paper presents AnalyticDB, a high-concurrent, lowlatency, and real-time OLAP database at Alibaba
  • AnalyticDB extends hybrid row-column layout to support both structured and other complex-typed data that may be involved in complex queries
Methods
  • The experiments are conducted on a cluster of eight physical machines, each with an Intel Xeon Platinum 8163 CPU (@2.50GHz), 300GB memory and 3TB SSD.
  • The second table is called Orders Table, which uses order id as its primary key, and has 64 primary partitions and 10 secondary partitions.
  • These two tables are associated with each other through user id.
  • It is because Druid MUST have a timestamp column as the partition key, and queries without specifying the timestamp column are much slower [36]
Results
  • The authors evaluate AnalyticDB in both real workloads and TPC-H benchmark [11] to demonstrate AnalyticDB’s performance under different types of queries and its write capability.
Conclusion
  • This paper presents AnalyticDB, a high-concurrent, lowlatency, and real-time OLAP database at Alibaba.
  • AnalyticDB has an efficient index engine to build index for all columns asynchronously, which helps improve query performance and hide index building overhead.
  • AnalyticDB extends hybrid row-column layout to support both structured and other complex-typed data that may be involved in complex queries.
  • To provide both highthroughput write and high-concurrent query, AnalyticDB follows a read/write decoupling architecture.
  • The authors' experiments have showed that all these designs help AnalyticDB achieve better performance compared to stateof-art OLAP systems
Summary
  • Introduction:

    AnalyticDB is an OLAP database system designed for high-concurrency, low-latency, and real-time analytical queries on PB scale, and has been running on 2000+ physical machines on Alibaba Cloud [1].
  • It serves external clients on.
  • Indexing is a straightforward way to improve query performance, building indexes on pre-specified columns is often no longer effective
  • Methods:

    The experiments are conducted on a cluster of eight physical machines, each with an Intel Xeon Platinum 8163 CPU (@2.50GHz), 300GB memory and 3TB SSD.
  • The second table is called Orders Table, which uses order id as its primary key, and has 64 primary partitions and 10 secondary partitions.
  • These two tables are associated with each other through user id.
  • It is because Druid MUST have a timestamp column as the partition key, and queries without specifying the timestamp column are much slower [36]
  • Results:

    The authors evaluate AnalyticDB in both real workloads and TPC-H benchmark [11] to demonstrate AnalyticDB’s performance under different types of queries and its write capability.
  • Conclusion:

    This paper presents AnalyticDB, a high-concurrent, lowlatency, and real-time OLAP database at Alibaba.
  • AnalyticDB has an efficient index engine to build index for all columns asynchronously, which helps improve query performance and hide index building overhead.
  • AnalyticDB extends hybrid row-column layout to support both structured and other complex-typed data that may be involved in complex queries.
  • To provide both highthroughput write and high-concurrent query, AnalyticDB follows a read/write decoupling architecture.
  • The authors' experiments have showed that all these designs help AnalyticDB achieve better performance compared to stateof-art OLAP systems
Tables
  • Table1: Comparison between AnalyticDB (ADB)
  • Table2: Three kinds of queries for evaluation. Query SELECT * FROM orders ORDER BY o trade time LIMIT 10 SELECT * FROM orders WHERE o trade time BETWEEN ’2018-11-13 15:15:21’ AND ’2018-11-13 16:15:21’ AND o trade prize BETWEEN 50 AND 60 AND o seller id=9999 LIMIT 1000 SELECT o seller id, SUM(o trade prize) AS c FROM orders JOIN user ON orders.o user id = user.u id WHERE u age=10 AND o trade time BETWEEN ’2018-11-13 15:15:21’ AND ’2018-11-13 16:15:21’ GROUP BY o seller id ORDER BY c DESC LIMIT 10
  • Table3: Write throughput under different numbers of write nodes
Download tables as Excel
Related work
  • AnalyticDB is built from scratch for large-scale and realtime analysis on cloud platform. In this section, we compare AnalyticDB with other systems.

    OLTP databases. OLTP databases such as MySQL [6] and PostgreSQL [8] are designed to support transactional queries, which can be considered as point lookups that involve one or several rows. Hence, storage engines in OLTP databases are row-oriented and build B+tree index [16] to speed up query performance. However, row-store does not fit for analytical queries as these queries only require a subset of columns, where row-store significantly amplifies I/Os. Moreover, OLTP databases usually actively update indexes in the write path, which is so expensive that affects both write throughput and query latency.
Funding
  • Figure 4 shows this partition placement among read nodes, With the storage-aware optimizer (Section 5.1), this placement helps to save the cost of data redistribution by more than 80%, measured from our production service
  • As we observe in our real-world AnalyticDB service, this overhead contributes to less than 5% of the overall query latency
  • In addition, by consolidating the internal data representation between storage layer and execution engine, AnalyticDB is able to operate directly on serialized binary data rather than Java objects. This helps eliminate the overhead of serialization and de-serialization, which accounts for more than 20% of time when shuffling a large amount of data
  • With careful designs, the all-column index just consumes 66% more storage
Study subjects and analysis
trillion rows of records: 100
AnalyticDB has been successfully deployed on Alibaba Cloud to serve numerous customers (both large and small). It is capable of holding 100 trillion rows of records, i.e., 10PB+ in size. At the same time, it is able to serve 10m+ writes and 100k+ queries per second, while completing complex queries within hundreds of milliseconds

Reference
  • Alibaba Cloud. https://www.alibabacloud.com.
    Findings
  • ANTLR ASM. https://www.antlr.org.
    Findings
  • Apache ORC File. https://orc.apache.org/.
    Findings
  • Benchmarking Nearest Neighbours. https://github.com/erikbern/ann-benchmarks.
    Findings
  • Greenplum. https://greenplum.org/.
    Findings
  • MySQL. https://www.mysql.com/.
    Findings
  • Pangu. https://www.alibabacloud.com/blog/pangu—the-highperformance-distributed-file-system-by-alibaba-cloud 594059.
    Findings
  • PostgreSQL. https://www.postgresql.org/.
    Findings
  • Presto. https://prestodb.io/.
    Findings
  • Teradata Database. http://www.teradata.com.
    Findings
  • TPC-H Benchmark. http://www.tpc.org/tpch/.
    Findings
  • D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD, pages 967–980. ACM, 2008.
    Google ScholarLocate open access versionFindings
  • M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In SIGMOD, pages 1383–1394. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • J. Backus. Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. ACM, 2007.
    Google ScholarFindings
  • P. A. Bernstein and N. Goodman. Multiversion concurrency control-theory and algorithms. ACM Transactions on Database Systems (TODS), 8(4):465–483, 1983.
    Google ScholarLocate open access versionFindings
  • D. Comer. Ubiquitous b-tree. ACM Computing Surveys (CSUR), 11(2):121–137, 1979.
    Google ScholarLocate open access versionFindings
  • T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
    Google ScholarFindings
  • J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
    Google ScholarLocate open access versionFindings
  • A. Eisenberg, J. Melton, K. Kulkarni, J.-E. Michels, and F. Zemke. Sql: 2003 has been published. ACM SIGMOD Record, 33(1):119–126, 2004.
    Google ScholarLocate open access versionFindings
  • M. Grund, J. Kruger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: a main memory hybrid storage engine. PVLDB, 4(2):105–116, 2010.
    Google ScholarLocate open access versionFindings
  • A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon redshift and the case for simpler data warehouses. In SIGMOD, pages 1917–1923. ACM, 2015.
    Google ScholarLocate open access versionFindings
  • K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In IJCAI, pages 1312–1317, 2011.
    Google ScholarLocate open access versionFindings
  • S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487–498. VLDB Endowment, 2006.
    Google ScholarLocate open access versionFindings
  • P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, volume 8. Boston, MA, USA, 2010.
    Google ScholarLocate open access versionFindings
  • J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, et al. Pinot: Realtime olap for 530 million users. In SIGMOD, pages 583–594. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117–128, 2011.
    Google ScholarLocate open access versionFindings
  • F. V. Jensen. An introduction to Bayesian networks, volume 210. UCL press London, 1996.
    Google ScholarFindings
  • M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, et al. Impala: A modern, open-source sql engine for hadoop. In Cidr, volume 1, page 9, 2015.
    Google ScholarLocate open access versionFindings
  • A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. PVLDB, 5(12):1790–1801, 2012.
    Google ScholarLocate open access versionFindings
  • G. M. Lohman. Grammar-like functional rules for representing query optimization alternatives, volume 17. ACM, 1988.
    Google ScholarLocate open access versionFindings
  • S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: interactive analysis of web-scale datasets. PVLDB, 3(1-2):330–339, 2010.
    Google ScholarLocate open access versionFindings
  • T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539–550, 2011.
    Google ScholarLocate open access versionFindings
  • K. Sato. An inside look at google bigquery.(2012). Retrieved Jan, 29:2018, 2012.
    Google ScholarFindings
  • M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O’Neil, et al. C-store: a column-oriented dbms. In VLDB, pages 553–564. VLDB Endowment, 2005.
    Google ScholarLocate open access versionFindings
  • A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. PVLDB, 2(2):1626–1629, 2009.
    Google ScholarLocate open access versionFindings
  • F. Yang, E. Tschetter, X. Leaute, N. Ray, G. Merlino, and D. Ganguli. Druid: A real-time analytical data store. In SIGMOD, pages 157–168. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2–2. USENIX Association, 2012.
    Google ScholarLocate open access versionFindings
  • Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. PVLDB, 7(13):1393–1404, 2014.
    Google ScholarLocate open access versionFindings
  • M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. IEEE, 2006.
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments