SFDoP: A Scalable Fused BFloat16 Dot-Product Architecture for DNN

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

引用 0|浏览2
暂无评分
摘要
The BFloat16(BF16) format has emerged as a driving force in Deep Neural Networks(DNNs), owing to its superior energy efficiency and lower memory footprint than traditional formats. Since the BF16 format is mainly used in computation-intensive layers such as the general matrix multiplication(GEMM) layer, this paper presents SFDoP, a scalable BF16 fused dot-product(DoP) architecture for high-performance computation in DNNs. The SFDoP features a novel fused 4-term DoP unit as a basic unit, which performs 4-term DoP operation in three cycles. More-term DoP units are constructed by extending this basic unit. The extended units incorporate early exponent comparison to mask latency and omit intermediate normalization and rounding to further improve performance. Compared with discrete designs, SFDoP-4 reduces latency by 15.6% for 4-term DoP operation, with greater reductions achieved in the extended units. Compared with existing BF16 designs, SFDoP exhibits improved throughput and energy efficiency, with gains of at least 82.2% and 28.1%, respectively. For GEMM operation of large size, SFDoP achieves better performance in the extended units than the basic unit.
更多
查看译文
关键词
AI computation,BF16 format,DoP operation,scalable architecture,fused floating-point operations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要