Block floating point computations using reduced bit-width vectors

D Lo,ES Chung, DC Burger,Lo Daniel,Chung Eric S, Burger Douglas C

user-5f8cf9244c775ec6fa691c99(2020)

引用 0|浏览40
暂无评分
摘要
A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.
更多
查看译文
关键词
Block floating-point,Significand,Dot product,Value (computer science),Computation,Artificial neural network,Arithmetic,Operation,Bit (horse),Computer science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要