Datacenter RPCs can be General and Fast

Log in, 2019.

Cited by: 42|Bibtex|Views171
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com|arxiv.org
Weibo:
We describe the design and implementation of a highperformance remote procedure call library for datacenter networks

Abstract:

It is commonly believed that datacenter networking software must sacrifice generality to attain high performance. The popularity of specialized distributed systems designed specifically for niche technologies such as RDMA, lossless networks, FPGAs, and programmable switches testifies to this belief. In this paper, we show that such specia...More

Code:

Data:

0
Introduction
  • Background and motivation

    The authors rst discuss aspects of modern datacenter networks relevant to eRPC.
  • Modern datacenter networks provide tens of Gbps perport bandwidth and a few microseconds round-trip latency [73, §2.1]
  • They support polling-based network I/O from userspace, eliminating interrupts and system call overhead from the datapath [28, 29].
  • The authors found that restricting each ow to one BDP of outstanding data prevents most packet drops even on lossy networks
  • The authors discuss these aspects below.
  • Lossless fabrics are useful even without RDMA: Some systems that do not use remote CPU bypass leverage losslessness to avoid the complexity and overhead of handling packet loss in software [38, 39, 47]
Highlights
  • Background and motivation

    We rst discuss aspects of modern datacenter networks relevant to eRPC
  • We describe the design and implementation of a highperformance remote procedure call (RPC) library for datacenter networks
  • Band), and three generations of NICs released between 2011 (CX3) and 2017 (CX5). eRPC works well on all three clusters, showing that our design is robust to NIC and network technology changes
  • ERPC is primarily optimized for Mellanox NICs. eRPC works with DPDK-capable NICs that support ow steering
  • 9 Conclusion eRPC is a fast, general-purpose RPC system that provides an attractive alternative to putting more functions in network hardware, and specialized system designs that depend on these functions. eRPC’s speed comes from prioritizing common-case performance, carefully combining a wide range of old and new optimizations, and the observation that switch bu er capacity far exceeds datacenter bandwidth-delay product (BDP). eRPC delivers performance that was until now believed possible only with lossless RDMA fabrics or specialized network hardware
  • The rst packet’s data and header are contiguous. This allows the NIC to fetch small messages with one DMA read; using multiple DMAs for small messages would substantially increase NIC processing and PCIe use, reducing message rate by up to 20% [40]
  • The TX queue must allow su cient pipelining to hide PCIe latency; we found that 64 entries are su cient in all cases. eRPC’s TX queue and TX completion queue have 64 entries by default, so their footprint does not depend on cluster size
Results
  • Evaluation clusters

    Table 1 shows the clusters used in this paper. They include two types of networks (lossy Ethernet, and lossless In ni-

    Band), and three generations of NICs released between 2011 (CX3) and 2017 (CX5). eRPC works well on all three clusters, showing that the design is robust to NIC and network technology changes.
  • Table 1 shows the clusters used in this paper.
  • They include two types of networks, and three generations of NICs released between 2011 (CX3) and 2017 (CX5).
  • For Mellanox Ethernet NICs, the authors generate UDP packets directly with libibverbs instead of going through DPDK, which internally uses libibverbs for these NICs. The authors' evaluation primarily uses the large CX4 cluster, which resembles real-world datacenters.
  • The six switches in the CX4 cluster are organized as ve ToRs with 40 25 GbE downlinks and ve 100 GbE uplinks, for a 2:1 oversubscription
Conclusion
  • ERPC is a fast, general-purpose RPC system that provides an attractive alternative to putting more functions in network hardware, and specialized system designs that depend on these functions. eRPC’s speed comes from prioritizing common-case performance, carefully combining a wide range of old and new optimizations, and the observation that switch bu er capacity far exceeds datacenter BDP. eRPC delivers performance that was until now believed possible only with lossless RDMA fabrics or specialized network hardware.
  • ERPC is a fast, general-purpose RPC system that provides an attractive alternative to putting more functions in network hardware, and specialized system designs that depend on these functions.
  • ERPC’s speed comes from prioritizing common-case performance, carefully combining a wide range of old and new optimizations, and the observation that switch bu er capacity far exceeds datacenter BDP.
  • ERPC delivers performance that was until now believed possible only with lossless RDMA fabrics or specialized network hardware.
  • It allows unmodi ed applications to perform close to the hardware limits.
  • The authors' ported versions of LibRaft and Masstree are, to the knowledge, the fastest replicated keyvalue store and networked database index in the academic literature, while operating end-to-end without additional network support
Summary
  • Introduction:

    Background and motivation

    The authors rst discuss aspects of modern datacenter networks relevant to eRPC.
  • Modern datacenter networks provide tens of Gbps perport bandwidth and a few microseconds round-trip latency [73, §2.1]
  • They support polling-based network I/O from userspace, eliminating interrupts and system call overhead from the datapath [28, 29].
  • The authors found that restricting each ow to one BDP of outstanding data prevents most packet drops even on lossy networks
  • The authors discuss these aspects below.
  • Lossless fabrics are useful even without RDMA: Some systems that do not use remote CPU bypass leverage losslessness to avoid the complexity and overhead of handling packet loss in software [38, 39, 47]
  • Objectives:

    The authors' goal is to allow developers to use eRPC in unmodi ed systems. The goal of the work is to answer the question: Can a general-purpose RPC library provide performance comparable to specialized systems? The authors' solution is based on two key insights.
  • Results:

    Evaluation clusters

    Table 1 shows the clusters used in this paper. They include two types of networks (lossy Ethernet, and lossless In ni-

    Band), and three generations of NICs released between 2011 (CX3) and 2017 (CX5). eRPC works well on all three clusters, showing that the design is robust to NIC and network technology changes.
  • Table 1 shows the clusters used in this paper.
  • They include two types of networks, and three generations of NICs released between 2011 (CX3) and 2017 (CX5).
  • For Mellanox Ethernet NICs, the authors generate UDP packets directly with libibverbs instead of going through DPDK, which internally uses libibverbs for these NICs. The authors' evaluation primarily uses the large CX4 cluster, which resembles real-world datacenters.
  • The six switches in the CX4 cluster are organized as ve ToRs with 40 25 GbE downlinks and ve 100 GbE uplinks, for a 2:1 oversubscription
  • Conclusion:

    ERPC is a fast, general-purpose RPC system that provides an attractive alternative to putting more functions in network hardware, and specialized system designs that depend on these functions. eRPC’s speed comes from prioritizing common-case performance, carefully combining a wide range of old and new optimizations, and the observation that switch bu er capacity far exceeds datacenter BDP. eRPC delivers performance that was until now believed possible only with lossless RDMA fabrics or specialized network hardware.
  • ERPC is a fast, general-purpose RPC system that provides an attractive alternative to putting more functions in network hardware, and specialized system designs that depend on these functions.
  • ERPC’s speed comes from prioritizing common-case performance, carefully combining a wide range of old and new optimizations, and the observation that switch bu er capacity far exceeds datacenter BDP.
  • ERPC delivers performance that was until now believed possible only with lossless RDMA fabrics or specialized network hardware.
  • It allows unmodi ed applications to perform close to the hardware limits.
  • The authors' ported versions of LibRaft and Masstree are, to the knowledge, the fastest replicated keyvalue store and networked database index in the academic literature, while operating end-to-end without additional network support
Tables
  • Table1: Measurement clusters. CX4 and CX3 are CloudLab [<a class="ref-link" id="c59" href="#r59">59</a>] and Emulab [<a class="ref-link" id="c68" href="#r68">68</a>] clusters, respectively
  • Table2: Comparison of median latency with eRPC and RDMA
  • Table3: Impact of disabling optimizations on small RPC rate (CX4)
  • Table4: eRPC’s 8 MB request throughput with packet loss
  • Table5: E ectiveness of congestion control (cc) during incast
  • Table6: Latency comparison for replicated PUTs and used it as-is. LibRaft is well-tested with fuzzing over a network simulator and 150+ unit tests. Its only requirement is that the user provide callbacks for sending and handling RPCs—which we implement using eRPC. Porting to eRPC required no changes to LibRaft’s code
Download tables as Excel
Related work
  • RPCs. There is a vast amount of literature on RPCs. The practice of optimizing an RPC wire protocol for small RPCs originates with Birrell and Nelson [19], who introduce the idea of an implicit-ACK. Similar to eRPC, the Sprite RPC system [67] directly uses raw datagrams and performs retransmissions only at clients. The Direct Access File System [23] was one of the rst to use RDMA in RPCs. It uses SEND/RECV messaging over a connected transport to initiate an RPC, and RDMA reads or writes to transfer the bulk of large RPC messages. This design is widely used in other systems such as NFS’s RPCs [20] and some MPI implementations [48]. In eRPC, we chose to transfer all data over datagram messaging to avoid the scalability limits of RDMA. Other RPC systems that use RDMA include Mellanox’s Accelio [4] and RFP [63]. These systems perform comparably to FaRM’s RPCs, which are slower than eRPC at scale by an order of magnitude.
Funding
  • This work was supported by funding from the National Science Foundation under awards CCF-1535821 and CNS-1700521, and by Intel via the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS)
  • Anuj Kalia is supported by the Facebook Fellowship
Reference
  • Private communication with Mellanox.
    Google ScholarFindings
  • Fast memcpy with SPDK and Intel I/OAT DMA Engine. https://software.intel.com/en-us/articles/
    Findings
  • https://www.nextplatform.com/2017/03/13/
    Findings
  • peek-inside-facebooks-server-fleet-upgrade/, 2017.
    Google ScholarFindings
  • [4] Mellanox Accelio. http://www.accelio.org, 2017.
    Findings
  • [5] Mellanox MLNX-OS user manual for Ethernet. http://www.mellanox.com/relateddocs/prod_management_software/MLNX-
    Findings
  • OS_ETH_v3_6_3508_UM.pdf, 2017.
    Google ScholarFindings
  • [6] Mellanox OFED for Linux release notes. http://www.mellanox.com/related-docs/prod_
    Findings
  • 3_2-1_0_1_1.pdf, 2017.
    Google ScholarFindings
  • [7] Oak Ridge leadership computing facility - Summit. https://www.olcf.ornl.gov/summit/, 2017.
    Findings
  • [8] Aurora 710 based on Barefoot To no switching silicon. https://netbergtw.com/products/aurora-
    Findings
  • [9] Facebook open switching system FBOSS and Wedge in the open. https://code.facebook.com/posts/ 843620439027582/facebook-open-switchingsystem-fboss-and-wedge-in-the-open/, 2018.
    Findings
  • [10] RDMAmojo - blog on RDMA technology and programming by Dotan Barak. http://www.rdmamojo.com/ 2013/01/12/ibv_modify_qp/, 2018.
    Findings
  • [11] Distributed asynchronous object storage stack. https: //github.com/daos-stack, 2018.
    Google ScholarFindings
  • [12] Tolly report: Mellanox SX1016 and SX1036 http://www.mellanox.
    Findings
  • Tolly212113MellanoxSwitchSXPerformance.pdf, 2018.
    Google ScholarFindings
  • [13] Jim Warner’s switch bu er page. https://people. ucsc.edu/~warner/buffer.html, 2018.
    Findings
  • https://github.com/willemt/raft, 2018.
    Findings
  • [15] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pfabric: Minimal nearoptimal datacenter transport. In Proc. ACM SIGCOMM, Hong Kong, China, Aug. 2013.
    Google ScholarLocate open access versionFindings
  • [16] B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale keyvalue store. In Proceedings of the SIGMETRICS’12, June
    Google ScholarLocate open access versionFindings
  • [17] A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014.
    Google ScholarLocate open access versionFindings
  • [18] C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It’s time for a redesign. In Proc. VLDB, New Delhi, India, Aug. 2016.
    Google ScholarLocate open access versionFindings
  • [19] A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 1984.
    Google ScholarLocate open access versionFindings
  • [20] B. Callaghan, T. Lingutla-Raj, A. Chiu, P. Staubach, and O. Asad. NFS over RDMA. In Proceedings of the ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, 2003.
    Google ScholarLocate open access versionFindings
  • [21] Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. In Proc. 11th ACM European Conference on Computer Systems (EuroSys), Apr. 2016.
    Google ScholarLocate open access versionFindings
  • [22] D. Crupnico, M. Kagan, A. Shahar, N. Bloch, and H. Chapman. Dynamically-connected transport service, May 19 2011. URL https://www.google.com/
    Findings
  • [23] M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. Talpey, and M. Wittle. The direct access le system. In Proceedings of the 2Nd USENIX Conference on File and Storage Technologies, 2003.
    Google ScholarLocate open access versionFindings
  • [24] DPDK. Data Plane Development Kit (DPDK). http: //dpdk.org/, 2017.
    Google ScholarFindings
  • [25] A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.
    Google ScholarLocate open access versionFindings
  • [26] A. Dragojević, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: Distributed transactions with consistency, availability, and performance. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
    Google ScholarLocate open access versionFindings
  • [27] A. Dragojevic, D. Narayanan, and M. Castro. RDMA reads: To use or not to use? IEEE Data Eng. Bull., 2017.
    Google ScholarLocate open access versionFindings
  • [28] M. D. et al. Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization. In Proc. 15th USENIX NSDI, Renton, WA, Apr. 2018.
    Google ScholarLocate open access versionFindings
  • [29] D. Firestone et al. Azure accelerated networking: SmartNICs in the public cloud. In Proc. 15th USENIX NSDI, Renton, WA, Apr. 2018.
    Google ScholarLocate open access versionFindings
  • [30] C. Guo, L. Yuan, D. Xiang, Y. Dang, R. Huang, D. A. Maltz, Z. Liu, V. Wang, B. Pang, H. Chen, Z. Lin, and V. Kurien. Pingmesh: A large-scale system for data center network latency measurement and analysis. In Proc. ACM SIGCOMM, London, UK, Aug. 2015.
    Google ScholarLocate open access versionFindings
  • [31] C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill. IronFleet: Proving practical distributed systems correct. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
    Google ScholarLocate open access versionFindings
  • [32] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proc. USENIX Annual Technical Conference, Boston, MA, June 2010.
    Google ScholarLocate open access versionFindings
  • [33] Z. István, D. Sidler, G. Alonso, and M. Vukolic. Consensus in a box: Inexpensive coordination in hardware. In Proc. 13th USENIX NSDI, Santa Clara, CA, May 2016.
    Google ScholarLocate open access versionFindings
  • [34] Z. István, D. Sidler, and G. Alonso. Caribou: Intelligent distributed storage. Aug. 2017.
    Google ScholarFindings
  • [35] E. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.
    Google ScholarLocate open access versionFindings
  • [36] X. Jin, X. Li, H. Zhang, R. Soulé, J. Lee, N. Foster, C. Kim, and I. Stoica. NetCache: Balancing key-value stores with fast in-network caching. In Proc. 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, Oct. 2017.
    Google ScholarLocate open access versionFindings
  • [37] X. Jin, X. Li, H. Zhang, N. Foster, J. Lee, R. Soulé, C. Kim, and I. Stoica. NetChain: Scale-free sub-RTT coordination. In Proc. 15th USENIX NSDI, Renton, WA, Apr. 2018.
    Google ScholarLocate open access versionFindings
  • [38] A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA e ciently for key-value services. In Proc. ACM SIGCOMM, Chicago, IL, Aug. 2014.
    Google ScholarLocate open access versionFindings
  • [39] A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided RDMA datagram RPCs. In Proc. 12th USENIX OSDI, Savannah, GA, Nov. 2016.
    Google ScholarLocate open access versionFindings
  • [40] A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high-performance RDMA systems. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.
    Google ScholarLocate open access versionFindings
  • [41] D. Kim, A. Memaripour, A. Badam, Y. Zhu, H. H. Liu, J. Padhye, S. Raindel, S. Swanson, V. Sekar, and S. Seshan. HyperLoop: Group-based NIC-o oading to accelerate replicated transactions in multi-tenant storage systems. In Proc. ACM SIGCOMM, Budapest, Hungary, Aug. 2018.
    Google ScholarLocate open access versionFindings
  • [42] M. J. Koop, J. K. Sridhar, and D. K. Panda. Scalable MPI design over In niBand using eXtended Reliable Connection. In 2008 IEEE International Conference on Cluster Computing, 2008.
    Google ScholarLocate open access versionFindings
  • [43] J. Li, E. Michael, N. K. Sharma, A. Szekeres, and D. R. K. Ports. Just say no to Paxos overhead: Replacing consensus with network ordering. In Proc. 12th USENIX OSDI, Savannah, GA, Nov. 2016.
    Google ScholarLocate open access versionFindings
  • [44] J. Li, E. Michael, and D. R. K. Ports. Eris: Coordinationfree consistent transactions using in-network concurrency control. In Proc. 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, Oct. 2017.
    Google ScholarLocate open access versionFindings
  • [45] S. Li, H. Lim, V. W. Lee, J. H. Ahn, A. Kalia, M. Kaminsky, D. G. Andersen, O. Seongil, S. Lee, and P. Dubey. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In ISCA, 2015.
    Google ScholarLocate open access versionFindings
  • [46] H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.
    Google ScholarLocate open access versionFindings
  • [47] F. Liu, L. Yin, and S. Blanas. Design and evaluation of an RDMA-aware data shu ing operator for parallel database systems. In Proc. 12th ACM European Conference on Computer Systems (EuroSys), Apr. 2017.
    Google ScholarLocate open access versionFindings
  • [48] J. Liu, J. Wu, and D. K. Panda. High performance RDMAbased MPI implementation over In niBand. International Journal of Parallel Programming, 2004.
    Google ScholarLocate open access versionFindings
  • [49] Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proc. 7th ACM European Conference on Computer Systems (EuroSys), Bern, Switzerland, Apr. 2012.
    Google ScholarLocate open access versionFindings
  • [50] C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-e cient key-value store. In Proc. USENIX Annual Technical Conference, San Jose, CA, June 2013.
    Google ScholarLocate open access versionFindings
  • [51] C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the Cell distributed BTree store. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.
    Google ScholarLocate open access versionFindings
  • [52] R. Mittal, T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. TIMELY: RTT-based congestion control for the datacenter. In Proc. ACM SIGCOMM, London, UK, Aug. 2015.
    Google ScholarLocate open access versionFindings
  • [53] R. Mittal, A. Shpiner, A. Panda, E. Zahavi, A. Krishnamurthy, S. Ratnasamy, and S. Shenker. Revisiting network support for RDMA. In Proc. ACM SIGCOMM, Budapest, Hungary, Aug. 2018.
    Google ScholarLocate open access versionFindings
  • [54] D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proc. USENIX Annual Technical Conference, Philadelphia, PA, June 2014.
    Google ScholarLocate open access versionFindings
  • [55] D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. In Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, Oct. 2011.
    Google ScholarLocate open access versionFindings
  • [56] J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, S. Rumble, R. Stutsman, and S. Yang. The RAMCloud storage system. ACM TOCS, 2015.
    Google ScholarLocate open access versionFindings
  • [57] A. Panda, S. Han, K. Jang, M. Walls, S. Ratnasamy, and S. Shenker. NetBricks: Taking the V out of NFV. In Proc. 12th USENIX OSDI, Savannah, GA, Nov. 2016.
    Google ScholarLocate open access versionFindings
  • [58] M. Poke and T. Hoe er. DARE: High-performance state machine replication on RDMA networks. In HPDC, 2015.
    Google ScholarLocate open access versionFindings
  • [59] R. Ricci, E. Eide, and The CloudLab Team. Introducing CloudLab: Scienti c infrastructure for advancing cloud architectures and applications. USENIX;login:, 2014.
    Google ScholarFindings
  • [60] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the social network’s (datacenter) network. In Proc. ACM SIGCOMM, London, UK, Aug. 2015.
    Google ScholarLocate open access versionFindings
  • [61] A. Saeed, N. Dukkipati, V. Valancius, V. The Lam, C. Contavalli, and A. Vahdat. Carousel: Scalable tra c shaping at end hosts. In Proc. ACM SIGCOMM, Los Angeles, CA, Aug. 2017.
    Google ScholarLocate open access versionFindings
  • [62] J. Shi, Y. Yao, R. Chen, H. Chen, and F. Li. Fast and concurrent RDF queries with RDMA-based distributed graph exploration. In Proc. 12th USENIX OSDI, Savannah, GA, Nov. 2016.
    Google ScholarLocate open access versionFindings
  • [63] M. Su, M. Zhang, K. Chen, Z. Guo, and Y. Wu. RFP: When RPC is faster than server-bypass with RDMA. In Proc. 12th ACM European Conference on Computer Systems (EuroSys), Apr. 2017.
    Google ScholarLocate open access versionFindings
  • [64] Y. Wang, X. Meng, L. Zhang, and J. Tan. C-hint: An e ective and reliable cache management for RDMAaccelerated key-value stores. In Proc. 5th ACM Symposium on Cloud Computing (SOCC), Seattle, WA, Nov.
    Google ScholarLocate open access versionFindings
  • [65] Y. Wang, L. Zhang, J. Tan, M. Li, Y. Gao, X. Guerin, X. Meng, and S. Meng. Hydradb: A resilient RDMAdriven key-value middleware for in-memory cluster computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015.
    Google ScholarLocate open access versionFindings
  • [66] X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast inmemory transaction processing using RDMA and HTM. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
    Google ScholarLocate open access versionFindings
  • [67] B. B. Welch. The Sprite remote procedure call system. Technical report, Berkeley, CA, USA, 1986.
    Google ScholarFindings
  • [68] B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. 5th USENIX OSDI, pages 255–270, Boston, MA, Dec. 2002.
    Google ScholarLocate open access versionFindings
  • [69] E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. In Proc. VLDB, Munich, Germany, Aug. 2017.
    Google ScholarLocate open access versionFindings
  • [70] J. Zhang, F. Ren, X. Yue, R. Shu, and C. Lin. Sharing bandwidth by allocating switch bu er in data center networks. IEEE Journal on Selected Areas in Communications, 2014.
    Google ScholarLocate open access versionFindings
  • [71] Q. Zhang, V. Liu, H. Zeng, and A. Krishnamurthy. Highresolution measurement of data center microbursts. In Proceedings of the 2017 Internet Measurement Conference, IMC ’17, 2017.
    Google ScholarLocate open access versionFindings
  • [72] J. Zhou, M. Tewari, M. Zhu, A. Kabbani, L. Poutievski, A. Singh, and A. Vahdat. WCMP: Weighted cost multipathing for improved fairness in data centers. In Proc. 9th ACM European Conference on Computer Systems (EuroSys), Apr. 2014.
    Google ScholarLocate open access versionFindings
  • [73] Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale RDMA deployments. In Proc. ACM SIGCOMM, London, UK, Aug. 2015.
    Google ScholarLocate open access versionFindings
  • [74] Y. Zhu, M. Ghobadi, V. Misra, and J. Padhye. ECN or delay: Lessons learnt from analysis of DCQCN and TIMELY. In Proc. CoNEXT, Dec. 2016.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Best Paper
Best Paper of NSDI, 2019
Tags
Comments