谷歌浏览器插件
订阅小程序
在清言上使用

PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads

2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD(2023)

引用 0|浏览17
暂无评分
摘要
Recently, the size of deep learning models has significantly increased, making the excessive memory access between the AI processor and DRAM a major bottleneck of the system. The processing-in-DRAM (DRAM-PIM) concept has emerged as a promising solution, which integrates computing logic within memory, thus saving abundant access to external memory. Although many simulators have been proposed to model and analyze the benefits of DRAM-PIM, they are often too slow to run an entire application. FPGA-based emulators have been introduced to overcome this limitation. However, none of the prior works include the full software stack from the model to DRAM-PIM hardware. This paper presents a full-stack processing-in-DRAM emulation framework named PRIMO, the first emulation framework that can model and analyze DRAM-PIM for end-to-end ML inference. PRIMO enables software developers to develop and test their customized software stacks on various ML workloads without requiring a real DRAM-PIM chip. Moreover, it allows designers to explore design space and monitor memory access patterns, facilitating software and hardware co-design for efficient DRAM-PIM architectures. To achieve these goals, we develop a real-time FPGA emulator that emulates DRAM-PIM architecture and generates experimental results such as predicted cycle information and computed output at incomparably high speeds compared to the CPU-based simulation. In addition, we propose a software stack comprising a PIM compiler that enables the execution of various ML workloads, including end-to-end inference, and a PIM driver that runs the workloads with high bandwidth utilization by leveraging virtual memory scatter-gather DMA. Finally, we demonstrate that PRIMO can successfully emulate DRAM-PIM 106.64-6093.56x faster than the CPU-based simulation framework for ML workloads ranging from small microbenchmarks to end-to-end inference of ResNets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要