Towards smarter flash-based storage management for enterprise workloads

Towards smarter flash-based storage management for enterprise workloads(2012)

引用 23|浏览17
暂无评分
摘要
The fast paced evolution in processor and main memory technology has over-shadowed the improvements made in the traditional storage subsystem, worsening the performance bottleneck due to the storage subsystem. Systems have traditionally employed the design philosophy of using storage devices as mere data containers with the real smartness residing within the software stack. Policies regarding data layout, allocation/de-allocation, aggregation, pre-fetching, buffering, etc., have been implemented as different layers of the software stack. However, two complementary trends are warranting a reinvestigation of this division of labor between storage hardware and software. First, significant advances have been made in storage device technology rendering hard disk drive (HDD) based solutions sub-optimal. Second, a growing class of applications have emerged that manipulate large quantities of unstructured/semi-structured data, necessitating newer programming paradigms such as MapReduce. Hence, continuing with this division in the future will pose significant hurdles in eliminating the storage bottleneck. This dissertation looks at developing a smarter management system for NAND flash based Solid State Drives (SSDs). The proposed mechanisms incorporate workload characteristics such as temporal and value locality into SSD’s internal data management. This paves the path for improving both the device’s behavior and the overall application performance. Furthermore, the dissertation demonstrates that smarter SSDs capable of performing application specific functions can not only deliver a much higher throughput than traditional ones, but also move the performance bottleneck away from the storage subsystem. One of the main bottlenecks of SSDs is the poor performance of random writes. These random writes are especially important in the context of enterprise-scale applications where multiple request streams can intermingle producing a highly randomized workload at the device level. One of the key reasons for the degraded performance is the SSD’s internal management firmware called the Flash Translation Layer (FTL). FTL helps in hiding the idiosyncrasies of NAND flash media by performing logical-to-physical address translation, garbage collection, and wear-leveling. The constrained amount of DRAM present on a SSD forces FTL to manage all or part of the address translations at a coarser granularity. This results in poor garbage collection behavior due to write amplification exacerbated by random writes. We develop a smarter fine-grained FTL scheme called DFTL which selectively caches the most popular address mappings in the on-SSD DRAM, leveraging temporal locality in block level access patterns. This workload awareness within the SSD management software significantly reduces write amplification, improving device’s performance and lifetime. Even though DFTL reduces garbage collection overheads significantly, it does not affect the actual write request patterns on the device. A smarter management layer within the SSD can shape the request patterns, especially the write traffic, thus improving device behavior. This dissertation evaluates workloads such as mail server, home directories, etc., and observes that certain content is accessed preferentially over others. This characteristic is leveraged to design a Content Addressable SSD (CA-SSD) which de-duplicates popular content, reducing writes (including garbage collection related writes) on the SSD, thus improving its endurance and performance. Even though SSDs offer higher throughput than HDDs, the storage media continues to be the performance bottleneck. However, the presence of processing elements within the SSD coupled with internal parallelism and fast NAND flash interface makes SSDs an attractive choice for migrating application specific functionality within the device. This has the potential to move the bottleneck away from the storage devices. This dissertation develops an optimization framework to demonstrate the advantages of these Smart SSDs for both data transfer and compute-intensive workloads. Smart SSDs reduce data transfer costs over the I/O interconnects. They also provide parallel compute elements, meeting the application demands better than their traditional counterparts. In summary, this dissertation provides a comprehensive framework for leveraging workload characteristics and NAND flash properties for building a smarter management system for SSDs. The resultant Smart SSDs improve device performance and endurance as well as reduce the storage bottleneck.
更多
查看译文
关键词
flash-based storage management,workload characteristic,Smart SSDs,performance bottleneck,storage device technology,management system,random writes,enterprise workloads,storage bottleneck,storage subsystem,storage device,garbage collection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要