Accelerating Prefix Scan with in-network computing on Intel PIUMA.

HIPC(2022)

引用 1|浏览18
暂无评分
摘要
Prefix Scan is a versatile collective used in several classes of algorithms including sorting, lexical analysis, graph analytics, and regex matching. It is also a powerful tool to perform tree operations and load balancing. However, host-based Prefix Scan implementations incur high latency, large network traffic and poor scalability on large distributed systems. We explore in-network computation to accelerate Prefix Scan, using switches with data aggregation capabilities. We discuss the fundamental challenges associated with offloading Prefix Scan onto a network, and resolve them with innovations in dataflow topology and embedding methodology. We implement the proposed approach on the Intel PIUMA system. To the best of our knowledge, this is the first realization of a Prefix Scan offloading onto network switches. Our in-network Prefix Scan is highly scalable with less than 5 mu s latency on 16K PIUMA nodes and 6x lower latency than the host-based Prefix Scan. The performance benefits directly translate to improved workload scalability, as we demonstrate using a key bioinformatics application called Sequence Alignment.
更多
查看译文
关键词
Collectives, In-network Computing, Prefix Scan
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要