AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We introduced a new family of codes called Locally Repairable Codes that have marginally suboptimal storage but significantly smaller repair disk I/O and network bandwidth requirements

XORing elephants: novel erasure codes for big data

PVLDB, no. 5 (2013): 325-336

引用759|浏览148
EI
下载 PDF 全文
引用
微博一下

摘要

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high stor...更多

代码

数据

简介
  • Facebook and many others are transitioning to erasure coding techniques to introduce redundancy while saving storage [4, 19], especially for data that is more archival in nature.
  • Using the parameters of Facebook clusters, the data blocks of each large file are grouped in stripes of 10 and for each such set, 4 parity blocks are created
  • This system (called RS (10, 4)) can tolerate any 4 block failures and has a storage overhead of only 40%.
  • RS codes are significantly more robust and storage e cient compared to replication
  • This storage overhead is the minimal possible, for this level of reliability [7].
  • Codes that achieve this optimal storage-reliability tradeo↵ are called Maximum Distance Separable (MDS) [32] and Reed-Solomon codes [27] form the most widely used MDS family
重点内容
  • For this reason, Facebook and many others are transitioning to erasure coding techniques to introduce redundancy while saving storage [4, 19], especially for data that is more archival in nature
  • We introduce new erasure codes that address the main challenges of distributed data reliability and information theoretic bounds that show the optimality of our construction
  • Our Contributions: We introduce a new family of erasure codes called Locally Repairable Codes (LRCs), that are e ciently repairable both in terms of network bandwidth and disk I/O
  • We analytically show that our codes are information theoretically optimal in terms of their locality, i.e., the number of other blocks needed to repair single block failures. We present both randomized and explicit Locally Repairable Codes constructions starting from generalized Reed-Solomon parities
  • We prove that Locally Repairable Codes have the optimal distance for that specific locality, due to an information theoretic tradeo↵ that we establish
  • We observed 2⇥ disk I/O and network reduction for the cost of 14% more storage, a price that seems reasonable for many scenarios
  • We introduced a new family of codes called Locally Repairable Codes (LRCs) that have marginally suboptimal storage but significantly smaller repair disk I/O and network bandwidth requirements
结果
  • The authors provide details on a series of experiments the authors performed to evaluate the performance of HDFSXorbas in two environments: Amazon’s Elastic Compute Cloud (EC2) [1] and a test cluster in Facebook.
  • HDFS Bytes Read corresponds to the total amount of data read by the jobs initiated for repair.
  • It is obtained by aggregating partial measurements collected from the statistics-reports of the jobs spawned following a failure event.
  • Since the cluster does not handle any external tra c, Network Tra c is equal to the amount of data moving into nodes
  • It is measured using Amazon’s AWS Cloudwatch monitoring tools.
  • Repair Duration is calculated as the time interval between the starting time of the first repair job and the ending time of the last repair job
结论
  • Modern storage systems are transitioning to erasure coding. The authors introduced a new family of codes called Locally Repairable Codes (LRCs) that have marginally suboptimal storage but significantly smaller repair disk I/O and network bandwidth requirements.
  • One related area where the authors believe locally repairable codes can have a significant impact is purely archival clusters.
  • In this case the authors can deploy large LRCs that can simultaneously o↵er high fault tolerance and small storage overhead.
  • This would be impractical if Reed-Solomon codes are used since the repair tra c grows linearly in the stripe size.
  • Local repairs would further allow spinning disks down [21] since very few are required for single block repairs
表格
  • Table1: Comparison summary of the three schemes. MTTDL assumes independent node failures
  • Table2: Repair impact on workload
  • Table3: Experiment on Facebook’s Cluster Results
Download tables as Excel
相关工作
  • Optimizing code designs for e cient repair is a topic that has recently attracted significant attention due to its relevance to distributed systems. There is a substantial volume of work and we only try to give a high-level overview here. The interested reader can refer to [7] and references therein.

    The first important distinction in the literature is between functional and exact repair. Functional repair means that when a block is lost, a di↵erent block is created that maintains the (n, k) fault tolerance of the code. The main problem with functional repair is that when a systematic block is lost, it will be replaced with a parity block. While global fault tolerance to n k erasures remains, reading a single block would now require access to k blocks. While this could be useful for archival systems with rare reads, it is not practical for our workloads. Therefore, we are interested only in codes with exact repair so that we can maintain the code systematic.
引用论文
  • [3] V. Cadambe, S. Jafar, H. Maleki, K. Ramchandran, and C. Suh. Asymptotic interference alignment for optimal repair of mds codes in distributed storage. Submitted to IEEE Transactions on Information Theory, Sep. 2011 (consolidated paper of arXiv:1004.4299 and arXiv:1004.4663).
    Findings
  • [4] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011.
    Google ScholarLocate open access versionFindings
  • [5] M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with orchestra. In SIGCOMM-Computer Communication Review, pages 98–109, 2011.
    Google ScholarLocate open access versionFindings
  • [6] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Transactions on Information Theory, pages 4539–4551, 2010.
    Google ScholarLocate open access versionFindings
  • [7] A. Dimakis, K. Ramchandran, Y. Wu, and C. Suh. A survey on network codes for distributed storage. Proceedings of the IEEE, 99(3):476–489, 2011.
    Google ScholarLocate open access versionFindings
  • [8] B. Fan, W. Tantisiriroj, L. Xiao, and G. Gibson. Diskreduce: Raid for data-intensive scalable computing. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, pages 6–10. ACM, 2009.
    Google ScholarLocate open access versionFindings
  • [9] D. Ford, F. Labelle, F. Popovici, M. Stokely, V. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, pages 1–7, 2010.
    Google ScholarLocate open access versionFindings
  • [10] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality of codeword symbols. CoRR, abs/1106.3625, 2011.
    Findings
  • [12] K. Greenan, J. Plank, and J. Wylie. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In HotStorage, 2010.
    Google ScholarLocate open access versionFindings
  • [13] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The cost of a cloud: Research problems in data center networks. Computer Communications Review (CCR), pages 68–73, 2009.
    Google ScholarLocate open access versionFindings
  • [14] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. SIGCOMM Comput. Commun. Rev., 39:51–62, Aug. 2009.
    Google ScholarLocate open access versionFindings
  • [15] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: a scalable and fault-tolerant network structure for data centers. SIGCOMM Comput. Commun. Rev., 38:75–86, August 2008.
    Google ScholarLocate open access versionFindings
  • [16] T. Ho, M. Medard, R. Koetter, D. Karger, M. E↵ros, J. Shi, and B. Leong. A random linear network coding approach to multicast. IEEE Transactions on Information Theory, pages 4413–4430, October 2006.
    Google ScholarLocate open access versionFindings
  • [17] C. Huang, M. Chen, and J. Li. Pyramid codes: Flexible schemes to trade space for access e ciency in reliable data storage systems. NCA, 2007.
    Google ScholarLocate open access versionFindings
  • [18] S. Jaggi, P. Sanders, P. A. Chou, M. E↵ros, S. Egner, K. Jain, and L. Tolhuizen. Polynomial time algorithms for multicast network code construction. Information Theory, IEEE Transactions on, 51(6):1973–1982, 2005.
    Google ScholarLocate open access versionFindings
  • [19] O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In FAST 2012.
    Google ScholarLocate open access versionFindings
  • [20] O. Khan, R. Burns, J. S. Plank, and C. Huang. In search of I/O-optimal recovery from disk failures. In HotStorage ’11: 3rd Workshop on Hot Topics in Storage and File Systems, Portland, June 2011. USENIX.
    Google ScholarFindings
  • [21] D. Narayanan, A. Donnelly, and A. Rowstron. Write o↵-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 4(3):10, 2008.
    Google ScholarLocate open access versionFindings
  • [22] F. Oggier and A. Datta. Self-repairing homomorphic codes for distributed storage systems. In INFOCOM, 2011 Proceedings IEEE, pages 1215 –1223, april 2011.
    Google ScholarLocate open access versionFindings
  • [23] D. Papailiopoulos and A. G. Dimakis. Locally repairable codes. In ISIT 2012.
    Google ScholarLocate open access versionFindings
  • [24] D. Papailiopoulos, J. Luo, A. Dimakis, C. Huang, and J. Li. Simple regenerating codes: Network coding for cloud storage. Arxiv preprint arXiv:1109.0264, 2011.
    Findings
  • [26] K. Rashmi, N. Shah, and P. Kumar. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction. Information Theory, IEEE Transactions on, 57(8):5227–5239, 2011.
    Google ScholarLocate open access versionFindings
  • [27] I. Reed and G. Solomon. Polynomial codes over certain finite fields. In Journal of the SIAM, 1960.
    Google ScholarLocate open access versionFindings
  • [28] R. Rodrigues and B. Liskov. High availability in dhts: Erasure coding vs. replication. Peer-to-Peer Systems IV, pages 226–239, 2005.
    Google ScholarLocate open access versionFindings
  • [29] M. Sathiamoorthy, M. Asteris, D. Papailiopoulous, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. USC Technical Report 2012, available online at http://bit.ly/xorbas.
    Findings
  • [30] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. Information Theory, IEEE Transactions on, 58(4):2134–2158, 2012.
    Google ScholarLocate open access versionFindings
  • [31] I. Tamo, Z. Wang, and J. Bruck. MDS array codes with optimal rebuilding. CoRR, abs/1103.3737, 2011.
    Findings
  • [32] S. B. Wicker and V. K. Bhargava. Reed-solomon codes and their applications. In IEEE Press, 1994.
    Google ScholarFindings
  • [33] Q. Xin, E. Miller, T. Schwarz, D. Long, S. Brandt, and W. Litwin. Reliability mechanisms for very large storage systems. In MSST, pages 146–156. IEEE, 2003.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn