H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

ACM transactions on design automation of electronic systems（2024）

引用 0|浏览16

暂无评分

摘要

Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.

查看译文

关键词

Compute-in-memory,DNN accelerator,heterogeneous 3D integration,multi-head self-attention,transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要