Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization

Pattern Recognition(2023)

引用 12|浏览58
暂无评分
摘要
•We introduce Mix-ViT, a novel mixing attentive vision transformer, bridging the gap between vision transformers and ultra-fine-grained visual categorization (ultra-FGVC) tasks.•A self-supervised module is developed to mix the high-level sample tokens and predict attentively substituted tokens, which enables understanding of contextual details towards discriminative ultra-FGVC.•Mix-ViT achieves competitive performance on five ultra-fine-grained datasets and two fine-grained datasets, demonstrating its effectiveness and efficiency for the challenging ultra-FGVC tasks.
更多
查看译文
关键词
Ultra-fine-grained visual categorization,Vision transformer,Self-supervised learning,Attentive mixing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要