Parendi: Thousand-Way Parallel RTL Simulation
arxiv(2024)
摘要
Hardware development relies on simulations, particularly cycle-accurate RTL
(Register Transfer Level) simulations, which consume significant time. As
single-processor performance grows only slowly, conventional, single-threaded
RTL simulation is becoming less practical for increasingly complex chips and
systems. A solution is parallel RTL simulation, where ideally, simulators could
run on thousands of parallel cores. However, existing simulators can only
exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation
on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine).
Simulation performance requires balancing three factors: synchronization,
communication, and computation. We experimentally evaluate each metric and
analyze how it affects parallel simulation speed, drawing on contrasts between
the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It
distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs
large RTL designs up to 4x faster than a powerful, state-of-the-art x86
multicore system.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要