Compact Filters for Fast Online Data Partitioning

Qing Zheng,Charles D. Cranor,Ankush Jain,Gregory R. Ganger,Garth A. Gibson,George Amvrosiadis,Bradley W. Settlemyer,Gary Grider

2019 IEEE International Conference on Cluster Computing (CLUSTER)（2019）

引用 1|浏览157

暂无评分

摘要

We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ data processing (i.e. operating on data as it is streamed to storage). One important example of this approach is in-situ data indexing. Prior work has shown the feasibility of indexing at scale as a two-step process. First, one partitions data by key across the CPU cores of a parallel job. Then each core indexes its subset as data is persisted. Online partitioning requires transferring data over the network so that it can be indexed and stored by the core responsible for the data. This approach is becoming increasingly costly as new computing platforms emphasize parallelism instead of individual core performance that is crucial for communication libraries and systems software in general. In addition to indexing, scalable online data partitioning is also useful in other contexts such as load balancing and efficient compression. We present FilterKV, an efficient data management scheme for fast online data partitioning of key-value (KV) pairs. FilterKV reduces the total amount of data sent over the network and to storage. We achieve this by: (a) partitioning pointers to KV pairs instead of the KV pairs themselves and (b) using a compact format to represent and store KV pointers. Results from LANL show that FilterKV can reduce total write slowdown (including partitioning overhead) by up to 3x across 4096 CPU cores.

查看译文

关键词

data processing,in-situ data indexing,data partitioning,core indexes,online partitioning,scalable online data partitioning,data management scheme,fast online data partitioning,partitioning pointers,CPU cores,catalog,query data,partitioning overhead

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要