SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUs

Yuki Naganawa,Hirokazu Kamei, Yamato Kanetaka, Haruki Nogami,Yoshihiro Maeda,Norishige Fukushima

IEEE ACCESS（2024）

引用 0|浏览0

暂无评分

摘要

Convolution is the inner product of the neighborhood signal and weights and plays a fundamental role in image processing; thus, acceleration of convolution is essential. Among convolutions, variable-weighted convolution is used in adaptive filters and edge-preserving smoothing to realize various applications. Some weights are replaced with lookup tables (LUTs) to accelerate these filters. LUT reference is a classical acceleration method. However, the difference between the growth rate in computing speed and memory I/O speed has limited the scope of utilization of LUT references. Speedup would be possible if registers could be used as LUTs, but their small size makes them difficult to utilize. Therefore, this study proposes a downsampling method to fit LUTs into SIMD registers, which are relatively large and an efficient reference method for register-LUTs. Experimental results show that the proposed method can reproduce an accuracy in PSNR of 65.52 (+25.11) dB, while a simple full-size LUT in the register size can only reproduce 40.41 dB. Using a wider register width, the PSNR was 78.63 (+38.22) dB with AVX-512 and 84.5 (+44.09) dB with bfloat16. The fastest proposed method was on average 4.82/3.72 times faster than direct vector computing, 2.99/3.10 times faster than vector addressing, and 3.79/7.80 times faster than scalar addressing on the AVX2/AVX-512 computers while exceeding the display limit of 60 dB for 8-bit displays. Taking into account these speed/accuracy trade-offs, the performance of the proposed method was superior. This paper shows that LUT references can be realized with small SIMD registers in convolution. The proposed method is expected to be extended to adaptive filters, convolutional neural networks, and other image processing applications by accelerating the approximation with this register-LUT. Our code is available at https://fukushimalab.github.io/registerLUT4conv/.

查看译文

关键词

Approximate computing,bilateral filtering,high-dimensional kernel filtering,high-performance computing,image filtering,nonlinear filters,parallel processing,SIMD,table lookup

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要