SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions
arxiv(2023)
Abstract
Recent quantization techniques have enabled heterogeneous precisions at very
fine granularity, e.g., each parameter/activation can take on a different
precision, resulting in compact neural networks without sacrificing accuracy.
However, there is a lack of efficient architectural support for such networks,
which require additional hardware to decode the precision settings for
individual variables, align the variables, and provide fine-grained
mixed-precision compute capabilities. The complexity of these operations
introduces high overheads. Thus, the improvements in inference latency/energy
of these networks are not commensurate with the compression ratio, and may be
inferior to larger quantized networks with uniform precisions.
We present an end-to-end co-design approach encompassing computer
architecture, training algorithm, and inference optimization to efficiently
execute networks with fine-grained heterogeneous precisions. The key to our
approach is a novel training algorithm designed to accommodate hardware
constraints and inference operation requirements, outputting networks with
input-channel-wise heterogeneous precisions and at most three precision levels.
Combined with inference optimization techniques, existing architectures with
low-cost enhancements can support such networks efficiently, yielding optimized
tradeoffs between accuracy, compression ratio and inference latency/energy.
We demonstrate the efficacy of our approach across CPU and GPU architectures.
For various representative neural networks, our approach achieves >10x
improvements in both compression ratio and inference latency, with negligible
degradation in accuracy compared to full-precision networks.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined