TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
arxiv(2024)
摘要
Diffusion models have emerged as preeminent contenders in the realm of
generative models. Distinguished by their distinctive sequential generative
processes, characterized by hundreds or even thousands of timesteps, diffusion
models progressively reconstruct images from pure Gaussian noise, with each
timestep necessitating full inference of the entire model. However, the
substantial computational demands inherent to these models present challenges
for deployment, quantization is thus widely used to lower the bit-width for
reducing the storage and computing overheads. Current quantization
methodologies primarily focus on model-side optimization, disregarding the
temporal dimension, such as the length of the timestep sequence, thereby
allowing redundant timesteps to continue consuming computational resources,
leaving substantial scope for accelerating the generative process. In this
paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and
quantization to achieve a superior performance-efficiency trade-off, addressing
both temporal and model optimization aspects. For timestep reduction, we devise
a non-uniform grouping scheme tailored to the non-uniform nature of the
denoising process, thereby mitigating the explosive combinations of timesteps.
In terms of quantization, we adopt a fine-grained layer-wise approach to
allocate varying bit-widths to different layers based on their respective
contributions to the final generative performance, thus rectifying performance
degradation observed in prior studies. To expedite the evaluation of
fine-grained quantization, we further devise a super-network to serve as a
precision solver by leveraging shared quantization results. These two design
components are seamlessly integrated within our framework, enabling rapid joint
exploration of the exponentially large decision space via a gradient-free
evolutionary search algorithm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要