Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization

Xiaohan Yu,Jun Wang,Yang Zhao,Yongsheng Gao

Pattern Recognition（2023）

引用 12|浏览58

暂无评分

摘要

•We introduce Mix-ViT, a novel mixing attentive vision transformer, bridging the gap between vision transformers and ultra-fine-grained visual categorization (ultra-FGVC) tasks.•A self-supervised module is developed to mix the high-level sample tokens and predict attentively substituted tokens, which enables understanding of contextual details towards discriminative ultra-FGVC.•Mix-ViT achieves competitive performance on five ultra-fine-grained datasets and two fine-grained datasets, demonstrating its effectiveness and efficiency for the challenging ultra-FGVC tasks.

查看译文

关键词

Ultra-fine-grained visual categorization,Vision transformer,Self-supervised learning,Attentive mixing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要