Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers
CoRR(2024)
摘要
Recently, diffusion transformers have gained wide attention with its
excellent performance in text-to-image and text-to-vidoe models, emphasizing
the need for transformers as backbone for diffusion models. Transformer-based
models have shown better generalization capability compared to CNN-based models
for general vision tasks. However, much less has been explored in the existing
literature regarding the capabilities of transformer-based diffusion backbones
and expanding their generative prowess to other datasets. This paper focuses on
enabling a single pre-trained diffusion transformer model to scale across
multiple datasets swiftly, allowing for the completion of diverse generative
tasks using just one model. To this end, we propose DiffScaler, an efficient
scaling strategy for diffusion models where we train a minimal amount of
parameters to adapt to different tasks. In particular, we learn task-specific
transformations at each layer by incorporating the ability to utilize the
learned subspaces of the pre-trained model, as well as the ability to learn
additional task-specific subspaces, which may be absent in the pre-training
dataset. As these parameters are independent, a single diffusion model with
these task-specific parameters can be used to perform multiple tasks
simultaneously. Moreover, we find that transformer-based diffusion models
significantly outperform CNN-based diffusion models methods while performing
fine-tuning over smaller datasets. We perform experiments on four unconditional
image generation datasets. We show that using our proposed method, a single
pre-trained model can scale up to perform these conditional and unconditional
tasks, respectively, with minimal parameter tuning while performing as close as
fine-tuning an entire diffusion model for that particular task.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要