SSD failures in the field: symptoms, causes, and prediction models

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(2019)

引用 41|浏览31
暂无评分
摘要
In recent years, solid state drives (SSDs) have become a staple of high-performance data centers for their speed and energy efficiency. In this work, we study the failure characteristics of 30,000 drives from a Google data center spanning six years. We characterize the workload conditions that lead to failures and illustrate that their root causes differ from common expectation but remain difficult to discern. Particularly, we study failure incidents that result in manual intervention from the repair process. We observe high levels of infant mortality and characterize the differences between infant and non-infant failures. We develop several machine learning failure prediction models that are shown to be surprisingly accurate, achieving high recall and low false positive rates. These models are used beyond simple prediction as they aid us to untangle the complex interaction of workload characteristics that lead to failures and identify failure root causes from monitored symptoms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要