TVT: Training-Free Vision Transformer Search on Tiny Datasets.
CoRR(2023)
摘要
Training-free Vision Transformer (ViT) architecture search is presented to
search for a better ViT with zero-cost proxies. While ViTs achieve significant
distillation gains from CNN teacher models on small datasets, the current
zero-cost proxies in ViTs do not generalize well to the distillation training
paradigm according to our experimental observations. In this paper, for the
first time, we investigate how to search in a training-free manner with the
help of teacher models and devise an effective Training-free ViT (TVT) search
framework. Firstly, we observe that the similarity of attention maps between
ViT and ConvNet teachers affects distill accuracy notably. Thus, we present a
teacher-aware metric conditioned on the feature attention relations between
teacher and student. Additionally, TVT employs the L2-Norm of the student's
weights as the student-capability metric to improve ranking consistency.
Finally, TVT searches for the best ViT for distilling with ConvNet teachers via
our teacher-aware metric and student-capability metric, resulting in impressive
gains in efficiency and effectiveness. Extensive experiments on various tiny
datasets and search spaces show that our TVT outperforms state-of-the-art
training-free search methods. The code will be released.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要