DLA: deep learning accelerator

Seyedeh Yasaman Hosseini Mirmahaleh,Midia Reshadi

COMPUTER VISION AND RECOGNITION SYSTEMS USING MACHINE AND DEEP LEARNING APPROACHES: Fundamentals, Technologies and Applications（2021）

引用 0|浏览1

暂无评分

摘要

Machine learning algorithms-applications (ML) have been deployed to support growth by employing the Internet of things (IoT) in various technologies and aimed at full smart cities. Graphic processing unit (GPU)-based systems or GPU-central processing unit (CPU)-based systems were aimed to implement various MLs' computations including deep neural networks (DNN), convolutional neural network (CNN), and recurrent neural network (RNN), which have utilized parallel computations in multiply-accumulate (MAC) operations. GPU-based systems satisfied flexibility for implementing different MLs and supporting their training and inference phases, whereas increasing neural network's layers remains its energy efficiency problems caused by enhancing memory accesses. According to deploying high accurate image processing, and pattern and speech recognition-based applications and grow up their complexity, some methods had to be considered to tackle the problem. Hence software (SW), hardware (HW), and SW-HW approaches have been proposed to face the challenges, which consist of memory capacity, delay, energy consumption, and bandwidth requirement. One of the approaches is the deep learning accelerator's (DLA) communication infrastructure which connects the processing elements (PE). Trained models' traffic distributes PEs using communication infrastructure, which can inspire various structures and designs such as application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA). As an example of DLA's efficiency, ASIC-based designs have less flexibility and reconfigurability compared to network on chip (NoC) and FPGA-based communication structures and can only support a specific purpose such as image processing. In this chapter, we will focus on hardware approaches to improve the GPU-based system's energy efficiency and performance in the inference phase, which is described as a deep learning accelerator including memory, communication infrastructure, and PEs. We first explain different communication network's role in improving or deteriorating data transfer of trained DNN models between memory and network, and processing elements. Next, we will describe various approaches and investigate their impact on DLA-based system's efficiency, which have included data-flow mapping, data-flow stationaries, traffic patterns, and partitioning methods.

查看译文

关键词

Deep learning accelerator, data-flow mapping, communication infrastructure, traffic pattern

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要