X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing

Gui Huang
Gui Huang
Jianying Wang
Jianying Wang
Dengcheng He
Dengcheng He
Tieying Zhang
Tieying Zhang
Qiang Li
Qiang Li

Proceedings of the 2019 International Conference on Management of Data, pp. 651-665, 2019.

Cited by: 12|Bibtex|Views265|DOI:https://doi.org/10.1145/3299869.3314041
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We introduce X-Engine, an OLTP storage engine, optimized for the largest e-commerce platform in the world at Alibaba, serving more than 600 million active customers globally

Abstract:

Alibaba runs the largest e-commerce platform in the world serving more than 600 million customers, with a GMV (gross merchandise value) exceeding USD 768 billion in FY2018. Online e-commerce transactions have three notable characteristics: (1) drastic increase of transactions per second with the kickoff of major sales and promotion events...More

Code:

Data:

0
Introduction
  • Alibaba runs the world’s largest and busiest e-commerce platform consisting of its consumer-to-consumer retail market Taobao, business-to-business market Tmall, and other online markets, serving more than 600 million active consumers with a GMV exceeding USD 768 billion in FY2018.
  • Such online shopping markets have created new ways of shopping and selling.
Highlights
  • Alibaba runs the world’s largest and busiest e-commerce platform consisting of its consumer-to-consumer retail market Taobao, business-to-business market Tmall, and other online markets, serving more than 600 million active consumers with a GMV exceeding USD 768 billion in FY2018
  • To embrace this 122-time spike shown in Figure 1, Alibaba adopts a shared-nothing architecture for OLTP databases where sharding is applied to distribute transactions among many database instances, and scales the number of database instances up before the spike comes. It takes significant monetary and engineering costs due to the sheer number of instances required. We address this problem by improving the single-machine capacity of storage engines, a core component of OLTP databases, so that the number of instances required for a given spike and throughput is reduced, or the achievable throughput given a fixed cost is increased
  • We find that any one of these methods alone is not sufficient for serving such ecommerce transactions: some assume a columnar storage that is not suitable for write-intensive transactions; others trade the performance of point and range queries to improve that of writes, which are not suitable for mixed read and write e-commerce workloads
  • We identify three major challenges for an OLTP storage engine in processing e-commerce transactions, and design X-Engine based on the LSM-tree structure
  • We first show how X-Engine performs when processing e-commerce OLTP workloads with mixtures of different types of queries
  • We introduce X-Engine, an OLTP storage engine, optimized for the largest e-commerce platform in the world at Alibaba, serving more than 600 million active customers globally
Methods
  • The authors compare X-Engine with two other popular storage engines: InnoDB [26] and RocksDB [15].
  • InnoDB is the default storage engine of MySQL.
  • RocksDB is a widely used LSM-tree storage engine, developed from LevelDB [16].
  • At Alibaba, the authors use both InnoDB and X-Engine with MySQL 5.7 to process e-commerce transactions in many of the database clusters.
  • MySQL 5.7 is a natural choice to evaluate InnoDB and X-Engine.
  • The authors used the latest release (Nov 2018) of RocksDB [14], and MyRocks (MySQL on RocksDB) [11]
Results
  • The authors first show how X-Engine performs when processing e-commerce OLTP workloads with mixtures of different types of queries.
  • The authors have already measured the real-world performance of MySQL databases running X-Engine during the Singles’ Day in Figure 1
Conclusion
  • The authors introduce X-Engine, an OLTP storage engine, optimized for the largest e-commerce platform in the world at Alibaba, serving more than 600 million active customers globally.
  • Online e-commerce transactions bring some distinct challenges, especially during the annual Singles’ Day Global Shopping Festival when customers shop extensively in a short period of time.
  • X-Engine has outperformed other storage engines processing the e-commerce workloads during online promotions and successfully served the Singles’ Day Shopping Festival.
  • The authors' ongoing and future work include adopting a shared storage design to improve its scalability for shared-nothing distributed transaction processing under sharding, and applying machine learning methods to predict data temperatures in order to facilitate intelligent scheduling of record placements in a tiered storage
Summary
  • Introduction:

    Alibaba runs the world’s largest and busiest e-commerce platform consisting of its consumer-to-consumer retail market Taobao, business-to-business market Tmall, and other online markets, serving more than 600 million active consumers with a GMV exceeding USD 768 billion in FY2018.
  • Such online shopping markets have created new ways of shopping and selling.
  • Methods:

    The authors compare X-Engine with two other popular storage engines: InnoDB [26] and RocksDB [15].
  • InnoDB is the default storage engine of MySQL.
  • RocksDB is a widely used LSM-tree storage engine, developed from LevelDB [16].
  • At Alibaba, the authors use both InnoDB and X-Engine with MySQL 5.7 to process e-commerce transactions in many of the database clusters.
  • MySQL 5.7 is a natural choice to evaluate InnoDB and X-Engine.
  • The authors used the latest release (Nov 2018) of RocksDB [14], and MyRocks (MySQL on RocksDB) [11]
  • Results:

    The authors first show how X-Engine performs when processing e-commerce OLTP workloads with mixtures of different types of queries.
  • The authors have already measured the real-world performance of MySQL databases running X-Engine during the Singles’ Day in Figure 1
  • Conclusion:

    The authors introduce X-Engine, an OLTP storage engine, optimized for the largest e-commerce platform in the world at Alibaba, serving more than 600 million active customers globally.
  • Online e-commerce transactions bring some distinct challenges, especially during the annual Singles’ Day Global Shopping Festival when customers shop extensively in a short period of time.
  • X-Engine has outperformed other storage engines processing the e-commerce workloads during online promotions and successfully served the Singles’ Day Shopping Festival.
  • The authors' ongoing and future work include adopting a shared storage design to improve its scalability for shared-nothing distributed transaction processing under sharding, and applying machine learning methods to predict data temperatures in order to facilitate intelligent scheduling of record placements in a tiered storage
Tables
  • Table1: Summary of optimizations in X-Engine
  • Table2: Configurations of MySQL (InnoDB)
  • Table3: Configurations of MyRocks (RocksDB)
  • Table4: Configurations of MySQL (X-Engine)
Download tables as Excel
Related work
  • Storage engine is a critical component of any OLTP database system. The Amazon Aurora storage engine, developed from a fork of InnoDB, exercises parallel and asynchronous writes to reduce the latency and improve the throughput of writes [1, 33]. We improve this design principle with a multistaged pipeline in X-Engine. Exploiting log-based structures is another efficient way to optimize the write performance [23, 24, 30]. The LSM-tree was proposed by O’Neil et al [27], and has attracted many subsequent efforts to optimize its design [8, 19, 31]. Chandramouli et al proposed a highly cache-optimized concurrent hash index with a hybrid log, supporting fast in-place updates [5]. Dong et al optimized the space amplification in RocksDB with a careful study of related trade-offs [9]. RocksDB also incorporates a separate stage for WAL writes [13], and concurrent writes in the memtable [12]. In X-Engine, we adopt the idea of memtables from LevelDB [16], and propose a suite of optimizations to achieve highly concurrent fast writes that reduce the write amplification of LSM-trees significantly.
Funding
  • Although the size of Level0 is probably less than 1% of the entire storage, it contains records that are only slightly older than those recently inserted ones in the memtables
Study subjects and analysis
Number of records: 1024
Throughput (GB/s). 0 2 4 8 16 32 64 128 256 512 1024 Number of records scanned. Percentage of reusable records

million records: 500
Queries per seconds (thousands). To evaluate the efficiency of X-Engine’s data reuse during compactions, we prepare two runs of records, containing 50 million and 500 million records, respectively. All keys are distributed uniformly at random

Reference
  • Steve Abraham. 2018. https://aws.amazon.com/cn/blogs/database/
    Findings
  • Mehmet Altinel, Christof Bornhövd, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, and Berthold Reinwald. 2003. Cache Tables: Paving the Way for an Adaptive Database Cache. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB ’03), Vol. 29.
    Google ScholarLocate open access versionFindings
  • Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cacheoblivious Streaming B-trees. In Proceedings of the Nineteenth Annual
    Google ScholarLocate open access versionFindings
  • Wei Cao, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment 11, 12 (2018), 1849–1862.
    Google ScholarLocate open access versionFindings
  • Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). ACM, New York, NY, USA, 275–290.
    Google ScholarLocate open access versionFindings
  • Shimin Chen, Phillip B. Gibbons, and Todd C. Mowry. 2001. Improving Index Performance Through Prefetching. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD ’01). ACM, New York, NY, USA, 235–246.
    Google ScholarLocate open access versionFindings
  • Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Transactions on Database Systems (2018).
    Google ScholarLocate open access versionFindings
  • Niv Dayan and Stratos Idreos. 201Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). ACM, New York, NY, USA, 505–520.
    Google ScholarLocate open access versionFindings
  • Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In The biennial Conference on Innovative Data Systems Research (CIDR), Vol. 3. 3.
    Google ScholarLocate open access versionFindings
  • Klaus Elhardt and Rudolf Bayer. 1984. A Database Cache for High Performance and Fast Restart in Database Systems. ACM Transactions on Database Systems (TODS) 9, 4 (Dec. 1984), 503–525.
    Google ScholarLocate open access versionFindings
  • Facebook. 2018. MyRocks. https://github.com/facebook/mysql-5.6/
    Findings
  • Facebook. 2018. RocksDB MemTable. https://github.com/facebook/
    Findings
  • Facebook. 2018. RocksDB Pipelined Write. https://github.com/
    Findings
  • Facebook. 2018. RocksDB Release v5.17.2. https://github.com/
    Findings
  • Facebook. 2019. RocksDB: A persistent key-value store for fast storage environments. https://rocksdb.org/.
    Findings
  • Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. URL: s://github. com/google/leveldb,% 20http://leveldb. org (2011).
    Google ScholarLocate open access versionFindings
  • Goetz Graefe and Harumi Kuno. 2010. Self-selecting, Self-tuning, Incrementally Optimized Indexes. In Proceedings of the 13th International
    Google ScholarLocate open access versionFindings
  • Sándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, and Peter Boncz. 2010. Positional Update Handling in Column Stores. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD ’10). ACM, New York, NY, USA, 543– 554.
    Google ScholarLocate open access versionFindings
  • H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and Rama Kanneganti. 1997. Incremental Organization for Data Recording and Warehousing. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB ’97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 16–25.
    Google ScholarLocate open access versionFindings
  • Alexey Kopytov. 2019. Scriptable database and system performance benchmark. https://github.com/akopytov/sysbench.
    Findings
  • Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB ’86). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 294–303.
    Google ScholarLocate open access versionFindings
  • Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 38–49.
    Google ScholarLocate open access versionFindings
  • Justin Levandoski, David Lomet, and Sudipta Sengupta. 2013. LLAMA: A Cache/Storage Subsystem for Modern Hardware. Proceedings of the VLDB Endowment 6, 10 (Aug. 2013), 877–888.
    Google ScholarLocate open access versionFindings
  • Justin J Levandoski, David B Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 302–313.
    Google ScholarLocate open access versionFindings
  • MemSQL. 2019. Database Benchmark Tool. https://github.com/memsql/dbbench.
    Findings
  • MySQL. 2018. Introduction to InnoDB. https://dev.mysql.com/doc/refman/8.0/en/innodb-introduction.html.
    Findings
  • Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351–385.
    Google ScholarLocate open access versionFindings
  • Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). ACM, New York, NY, USA, 497–514.
    Google ScholarLocate open access versionFindings
  • Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A Space-efficient Key-value Storage Engine for Semi-sorted Data. Proceedings of the VLDB Endowment 10, 13 (Sept. 2017), 2037–2048.
    Google ScholarLocate open access versionFindings
  • Mendel Rosenblum and John K. Ousterhout. 1992. The Design and Implementation of a Log-structured File System. ACM Transactions on Computer Systems (TOCS) 10, 1 (Feb. 1992), 26–52.
    Google ScholarLocate open access versionFindings
  • Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA, 217–228.
    Google ScholarLocate open access versionFindings
  • Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and Xiaodong Zhang. 2017. LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 68–79.
    Google ScholarLocate open access versionFindings
  • Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD ’17). ACM, New York, NY, USA, 1041–1052.
    Google ScholarLocate open access versionFindings
  • Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. 2012. LogBase: A Scalable Log-structured Database System in the Cloud. Proceedings of the VLDB Endowment 5, 10 (June 2012), 1004–1015.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments