Extracting more concurrency from distributed transactions

OSDI, pp. 479-494, 2014.

Cited by: 99|Bibtex|Views90
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
When the contention is high, ROCOCO’s throughput is 130% and 347% higher than that of 2PL and optimistic concurrency control

Abstract:

Distributed storage systems run transactions across machines to ensure serializability. Traditional protocols for distributed transactions are based on two-phase locking (2PL) or optimistic concurrency control (OCC). 2PL serializes transactions as soon as they conflict and OCC resorts to aborts, leaving many opportunities for concurrency ...More

Code:

Data:

0
Introduction
  • Many large-scale Web services, such as Amazon, rely on a distributed online transaction processing (OLTP) system as their storage backend.
  • OLTP systems require concurrency control to guarantee strict serializability [12, 13], so that websites running on top of them can function correctly.
  • Contention is not rare in large-scale OLTP applications.
  • Consider a transaction where a customer purchases a few items from a shopping website.
  • As the system scales—i.e., the site becomes more popular and has more customers, but maintains a relatively stable set of items—concurrent purchases to the same item are more likely to happen, leading to a greater contention rate
Highlights
  • Many large-scale Web services, such as Amazon, rely on a distributed online transaction processing (OLTP) system as their storage backend
  • When the contention is high, ROCOCO’s throughput is 130% and 347% higher than that of 2PL and optimistic concurrency control (OCC)
  • OLTP systems require concurrency control to guarantee strict serializability [12, 13], so that websites running on top of them can function correctly
  • This section will show that ROCOCO has higher throughput and lower latency than OCC and 2PL under all levels of contention and that as contention increases
  • Our prototype contains over 20 000 lines of C++ code, of which 10 000 are for concurrency control
  • This paper presented ROCOCO, a novel concurrency control protocol for distributed transactions
Methods
  • The design of ROCOCO includes an offline checker and a runtime protocol.
  • The runtime protocol tracks the dependencies between pieces and reorders their execution if necessary for correctness.
  • The authors explain ROCOCO’s offline check (§ 3.1), runtime protocol (§ 3.2), and sketch its correctness (§ 3.3).
  • This subsection explains the difference between immediate pieces of a transaction that cannot be reordered and deferrable pieces that can
  • It explains how ROCOCO’s offline checker uses transaction profiles including immediate/deferrable information to check for the necessary conditions
Results
  • The authors' evaluation explores two key questions: 1.
  • The authors' prototype contains over 20 000 lines of C++ code, of which 10 000 are for concurrency control
  • It uses a custom RPC library implemented by one of the authors for communication [4].
  • It adopts the simple threading model of H-Store [52] that uses a single worker thread on each server to sequentially process the server’s transaction pieces.
  • Stored procedure— i.e., a piece of a transaction—is written as a C++ function that is loaded into the server binary at launch time
Conclusion
  • This paper presented ROCOCO, a novel concurrency control protocol for distributed transactions.
  • With the help of offline checking, ROCOCO reorders pieces of interfering transactions into a strict-serializable order and avoids aborts.
  • In a scaled TPC-C benchmark ROCOCO outperformed conventional protocols and showed stable performance with increasing contention
Summary
  • Introduction:

    Many large-scale Web services, such as Amazon, rely on a distributed online transaction processing (OLTP) system as their storage backend.
  • OLTP systems require concurrency control to guarantee strict serializability [12, 13], so that websites running on top of them can function correctly.
  • Contention is not rare in large-scale OLTP applications.
  • Consider a transaction where a customer purchases a few items from a shopping website.
  • As the system scales—i.e., the site becomes more popular and has more customers, but maintains a relatively stable set of items—concurrent purchases to the same item are more likely to happen, leading to a greater contention rate
  • Methods:

    The design of ROCOCO includes an offline checker and a runtime protocol.
  • The runtime protocol tracks the dependencies between pieces and reorders their execution if necessary for correctness.
  • The authors explain ROCOCO’s offline check (§ 3.1), runtime protocol (§ 3.2), and sketch its correctness (§ 3.3).
  • This subsection explains the difference between immediate pieces of a transaction that cannot be reordered and deferrable pieces that can
  • It explains how ROCOCO’s offline checker uses transaction profiles including immediate/deferrable information to check for the necessary conditions
  • Results:

    The authors' evaluation explores two key questions: 1.
  • The authors' prototype contains over 20 000 lines of C++ code, of which 10 000 are for concurrency control
  • It uses a custom RPC library implemented by one of the authors for communication [4].
  • It adopts the simple threading model of H-Store [52] that uses a single worker thread on each server to sequentially process the server’s transaction pieces.
  • Stored procedure— i.e., a piece of a transaction—is written as a C++ function that is loaded into the server binary at launch time
  • Conclusion:

    This paper presented ROCOCO, a novel concurrency control protocol for distributed transactions.
  • With the help of offline checking, ROCOCO reorders pieces of interfering transactions into a strict-serializable order and avoids aborts.
  • In a scaled TPC-C benchmark ROCOCO outperformed conventional protocols and showed stable performance with increasing contention
Tables
  • Table1: TPC-C commit transaction mix ratio in a ROCOCO trial. rw stands for general read-write transactions and ro stands for read-only
  • Table2: Effect of logging in our local cluster
Download tables as Excel
Related work
  • General transactions with 2PL and OCC. Many seminal distributed databases such as Gamma [22], Bubba [16], and R* [45] use forms of 2PL. Spanner [19] is Google’s linearizable global-scale database that uses 2PL for read-write transactions and a separate timestamp based protocol from read-only transactions. Replicated Commit optimizes the across site latency in Spanner’s commit protocol [44].

    OCC is also used in several recent systems, such as H-Store [33] and VoltDB [6]. MDCC [35] uses OCC for geo-replicated storage. Percolator uses OCC to pro-
Funding
  • This work is supported in part by the National Science Foundation under award CNS-1218117
  • We also thank Garth Gibson and the PRObE team for the testbed (NSF awards CNS-1042537 and CNS-1042543). Shuai Mu’s work is also supported by the China Scholarship Council
Reference
  • Kodiak testbeds. http://portal.nmc-probe.org/.
    Findings
  • Retwis. http://retwis.antirez.com/.
    Findings
  • RUBiS. http://rubis.ow2.org/. in https://github.com/santazhang/simple-rpc.
    Findings
  • [5] TPC-C Benchmark. http://www.tpc.org/tpcc/.
    Findings
  • [6] VoltDB. http://www.voltdb.com/.
    Findings
  • [7] A. Adya, R. Gruber, B. Liskov, and U. Maheshwari. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, SIGMOD ’95, pages 23–34, New York, NY, USA, 1995. ACM.
    Google ScholarLocate open access versionFindings
  • [8] R. Agrawal, M. J. Carey, and M. Livny. Concurrency control performance modeling: alternatives and implications. ACM Transactions on Database Systems (TODS), 12(4):609–654, 1987.
    Google ScholarLocate open access versionFindings
  • [9] M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. In ACM SIGOPS Operating Systems Review, volume 41, pages 159–174. ACM, 2007.
    Google ScholarLocate open access versionFindings
  • [10] J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In CIDR, volume 11, pages 223– 234, 2011.
    Google ScholarLocate open access versionFindings
  • [11] A. J. Bernstein, D. S. Gerstl, and P. M. Lewis. Concurrency control for step-decomposed transactions. Information Systems, 24(8):673–698, 1999.
    Google ScholarLocate open access versionFindings
  • [12] P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR), 13(2):185–221, 1981.
    Google ScholarLocate open access versionFindings
  • [13] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, volume 370. Addison-wesley New York, 1987.
    Google ScholarFindings
  • [14] P. Bhatotia, A. Wieder, ̇I. E. Akkus, R. Rodrigues, and U. A. Acar. Large-scale incremental data processing with change propagation. In Proceedings of the 3rd USENIX conference on Hot topics in cloud computing, pages 18– 18. USENIX Association, 2011.
    Google ScholarLocate open access versionFindings
  • [15] W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, pages 11–11. USENIX Association, 2011.
    Google ScholarLocate open access versionFindings
  • [16] H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. Knowledge and Data Engineering, IEEE Transactions on, 2(1):4–24, 1990.
    Google ScholarLocate open access versionFindings
  • [17] Y. Breitbart, H. Garcia-Molina, and A. Silberschatz. Overview of multidatabase transaction management. The VLDB Journal, 1(2):181–239, 1992.
    Google ScholarLocate open access versionFindings
  • [18] B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment, 1(2):1277–1288, 2008.
    Google ScholarLocate open access versionFindings
  • [19] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Googles globally distributed database. ACM Transactions on Computer Systems (TOCS), 31(3):8, 2013.
    Google ScholarLocate open access versionFindings
  • [20] J. Cowling and B. Liskov. Granola: low-overhead distributed transaction coordination. In Proceedings of the 2012 USENIX conference on Annual Technical Conference, pages 21–21. USENIX Association, 2012.
    Google ScholarLocate open access versionFindings
  • [21] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pages 205–220, New York, NY, USA, 2007. ACM.
    Google ScholarLocate open access versionFindings
  • [22] D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. The Gamma database machine project. Knowledge and Data Engineering, IEEE Transactions on, 2(1):44–62, 1990.
    Google ScholarLocate open access versionFindings
  • [23] R. Escriva, B. Wong, and E. G. Sirer. Hyperdex: A distributed, searchable key-value store. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’12, pages 25–36, New York, NY, USA, 2012. ACM.
    Google ScholarLocate open access versionFindings
  • [24] R. Escriva, B. Wong, and E. G. Sirer. Warp: Lightweight Multi-Key Transactions for Key-Value Stores. Technical report, 2014.
    Google ScholarFindings
  • [25] H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems (TODS), 8(2):186–213, 1983.
    Google ScholarLocate open access versionFindings
  • [26] H. Garcia-Molina, R. J. Lipton, and J. Valdes. A massive memory machine. Computers, IEEE Transactions on, 100(5):391–399, 1984.
    Google ScholarLocate open access versionFindings
  • [27] H. Garcia-Molina and K. Salem. Sagas. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data, SIGMOD ’87, pages 249–259, New York, NY, USA, 1987. ACM.
    Google ScholarLocate open access versionFindings
  • [28] H. Garcia-Molina and K. Salem. Main memory database systems: An overview. Knowledge and Data Engineering, IEEE Transactions on, 4(6):509–516, 1992.
    Google ScholarLocate open access versionFindings
  • [29] J. Gray and L. Lamport. Consensus on transaction commit. ACM Transactions on Database Systems (TODS), 31(1):133–160, 2006.
    Google ScholarLocate open access versionFindings
  • [30] S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP through the looking glass, and what we found there. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 981–992. ACM, 2008.
    Google ScholarLocate open access versionFindings
  • [31] M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463–492, 1990.
    Google ScholarLocate open access versionFindings
  • [32] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, volume 8, pages 11–11, 2010.
    Google ScholarLocate open access versionFindings
  • [33] E. P. Jones, D. J. Abadi, and S. Madden. Low over- 15 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14) 493 head concurrency control for partitioned main memory databases. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 603–614. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • [34] R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496–1499, 2008.
    Google ScholarLocate open access versionFindings
  • [35] T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data center consistency. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 113–126. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • [36] H.-T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.
    Google ScholarLocate open access versionFindings
  • [37] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010.
    Google ScholarLocate open access versionFindings
  • [39] L. Lamport. Paxos made simple. ACM Sigact News, 32(4):18–25, 2001.
    Google ScholarLocate open access versionFindings
  • [40] C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguica, and R. Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings USENIX Symposium on Operating System Design and Implementation (OSDI), 2012.
    Google ScholarLocate open access versionFindings
  • [41] K. Li and J. F. Naughton. Multiprocessor main memory transaction processing. In Proceedings of the first international symposium on Databases in parallel and distributed systems, pages 177–187. IEEE Computer Society Press, 2000.
    Google ScholarLocate open access versionFindings
  • [42] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don’t settle for eventual: scalable causal consistency for wide-area storage with COPS. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 401–416. ACM, 2011.
    Google ScholarLocate open access versionFindings
  • [43] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Symposium on Networked Systems Design and Implementation, 2013.
    Google ScholarLocate open access versionFindings
  • [44] H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency multi-datacenter databases using replicated commit. Proceedings of the VLDB Endowment, 6(9):661–672, 2013.
    Google ScholarLocate open access versionFindings
  • [45] C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system. ACM Transactions on Database Systems (TODS), 11(4):378–396, 1986.
    Google ScholarLocate open access versionFindings
  • [46] I. Moraru, D. G. Andersen, and M. Kaminsky. There is more consensus in Egalitarian parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 358–372. ACM, 2013.
    Google ScholarLocate open access versionFindings
  • [47] S. Mu, Y. Cui, Y. Zhang, W. Lloyd, and J. Li. Extracting More Concurrency from Distribted Transactions. Technical Report TR2014-970, New York University, Courant Institute of Mathematical Sciences, 2014.
    Google ScholarFindings
  • [48] D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis II. System level concurrency control for distributed database systems. ACM Transactions on Database Systems (TODS), 3(2):178–198, 1978.
    Google ScholarLocate open access versionFindings
  • [49] D. Shasha, F. Llirbat, E. Simon, and P. Valduriez. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems (TODS), 20(3):325–363, 1995.
    Google ScholarLocate open access versionFindings
  • [50] D. Shasha, E. Simon, and P. Valduriez. Simple rational guidance for chopping up transactions. In ACM SIGMOD Record, volume 21, pages 298–307. ACM, 1992.
    Google ScholarLocate open access versionFindings
  • [51] Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 385–400. ACM, 2011.
    Google ScholarLocate open access versionFindings
  • [52] M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era:(it’s time for a complete rewrite). In Proceedings of the 33rd international conference on Very large data bases, pages 1150–1160. VLDB Endowment, 2007.
    Google ScholarLocate open access versionFindings
  • [53] R. Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972.
    Google ScholarLocate open access versionFindings
  • [54] A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 1–12. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • [55] S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 18–32, New York, NY, USA, 2013. ACM.
    Google ScholarLocate open access versionFindings
  • [56] A. Whitney, D. Shasha, and S. Apter. High volume transaction processing without concurrency control, two phase commit, sql or C++. In Seventh International Workshop on High Performance Transaction Systems, Asilomar, 1997.
    Google ScholarLocate open access versionFindings
  • [57] Y. Zhang, R. Power, S. Zhou, Y. Sovran, M. K. Aguilera, and J. Li. Transaction chains: achieving serializability with low latency in geo-distributed storage systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 276–291. ACM, 2013.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments