Few-shot Learner Parameterization by Diffusion Time-steps
CVPR 2024(2024)
摘要
Even when using large multi-modal foundation models, few-shot learning is
still challenging – if there is no proper inductive bias, it is nearly
impossible to keep the nuanced class attributes while removing the visually
prominent attributes that spuriously correlate with class labels. To this end,
we find an inductive bias that the time-steps of a Diffusion Model (DM) can
isolate the nuanced class attributes, i.e., as the forward diffusion adds noise
to an image at each time-step, nuanced attributes are usually lost at an
earlier time-step than the spurious attributes that are visually prominent.
Building on this, we propose Time-step Few-shot (TiF) learner. We train
class-specific low-rank adapters for a text-conditioned DM to make up for the
lost attributes, such that images can be accurately reconstructed from their
noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt
are essentially a parameterization of only the nuanced class attributes. For a
test image, we can use the parameterization to only extract the nuanced class
attributes for classification. TiF learner significantly outperforms OpenCLIP
and its adapters on a variety of fine-grained and customized few-shot learning
tasks. Codes are in https://github.com/yue-zhongqi/tif.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要