Faster Modular Exponentiation Using Double Precision Floating Point Arithmetic on the GPU

Niall Emmart,Fangyu Zheng,Charles C. Weems

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)（2018）

引用 11|浏览20

暂无评分

摘要

This paper presents a new approach to integer multiple precision (MP) modular exponentiation, using double-precision floating point (DPF) operations, that is suitable for GPU implementation. We show speedups ranging from 20 % to 34 % over the best prior GPU times for sizes corresponding to common RSA cryptographic operations (2048 to 4096 bits). Three techniques are described. First, by adding 2 ¹⁰⁴ to the high half of the product, and 2 ⁵² to the low half, we set the implicit leading 1 in the DPF mantissa so that the full 52 explicit bits are available for each half of the 104-bit products of samples. Second, the DPF values are cast bitwise to 64-bit integers for adding the column sums to get the MP result. Normally the cast would require masking off the exponents, but because they are constant, we can include them in the column sums and correct just once for their total. Third, by initializing the column sums with the appropriate negative value to compensate for the exponent sums, no corrective subtraction is needed. Our implementation on an NVIDIA GTX Titan Black GPU achieves between 132.5K and 161.9K modular exponentiations per second of size 1024 bits, with latencies ranging from 21.7 ms to 17.8 ms, making it practical for online RSA applications. Proportional results are shown for 1536 and 2048 bits. The implementation is so efficient that its maximum sustained performance is actually bounded by the thermal limit of the GPU.

查看译文

关键词

common RSA cryptographic operations,DPF mantissa,column sums,integer multiple precision modular exponentiation,exponent sum compensation,online RSA applications,multiple precision modular exponentiation,double precision floating point arithmetic,faster modular exponentiation,NVIDIA GTX Titan Black GPU,appropriate negative value,word length 1536.0 bit,time 17.8 ms to 21.7 ms,word length 1024.0 bit,word length 2048 bit to 4096 bit,word length 104 bit,word length 64 bit

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要