Implementations Impact on Iterative Image Processing for Embedded GPU

29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021)(2021)

引用 4|浏览0
暂无评分
摘要
The emergence of low-power embedded Graphical Processing Units (GPUs) with high computation capabilities has enabled the integration of image processing chains in a wide variety of embedded systems. Various optimisation techniques are however needed in order to get the most out of an embedded GPU. This paper explores several optimisation methods for iterative stencil-like image processing algorithms on embedded NVIDIA GPUs using the Compute Unified Device Architecture (CUDA) API. We chose to focus our architectural optimisations on the TV-L1 algorithm, an optical flow estimation method based on total variation (TV) regularisation and the L1 norm. It is widely used as a model for more complex optical flow estimations and is used in many recent video processing applications. In this work we evaluate the impact of architecture-oriented optimisations on both execution time and energy consumption on several Nvidia Jetson GPU embedded boards. Results show a speedup up to 3x compared to State-of-the-Art versions as well as a 2.6x decrease in energy consumption.
更多
查看译文
关键词
GPU, Embedded System, Image Processing, TV-L1, Optical flow, Energy Consumption
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要