A Systematic Approach for Acceleration of Matrix-Vector Operations in CGRA through Algorithm-Architecture Co-Design

Farhad Merchant,Tarun Vatwani,Anupam Chattopadhyay,Soumyendu Raha,S. K. Nandy,Ranjani Narayan,Rainer Leupers

2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID)（2019）

引用 3|浏览30

暂无评分

摘要

Matrix-vector operations play pivotal role in engineering and scientific applications ranging from machine learning to computational finance. Matrix-vector operations have time complexity of O(n^2) and they are challenging to accelerate since these operations are memory bound operations where ratio of the arithmetic operations to the data movement is O(1). In this paper, we present a systematic methodology of algorithm-architecture co-design to accelerate matrix-vector operations where we emphasize on the matrix-vector multiplication (gemv) and the vector transpose-matrix multiplication (vtm). In our methodology, we perform a detailed analysis of directed acyclic graphs of the routines and identify macro operations that can be realized on a reconfigurable data-path that is tightly coupled to the pipeline of a processing element. It is shown that the PE clearly outperforms state-of-the-art realizations of gemv and vtm attaining 135% performance improvement over multicore and 200% over general purpose graphics processing units. In the parallel realization on REDEFINE coarse-grained reconfigurable architecture, it is shown that the solution is scalable.

查看译文

关键词

dense linear algebra,matrix-vector operations,instruction level parallelism,scalability

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要