The Image Data and Backbone in Weakly Supervised Fine-Grained Visual Categorization: A Revisit and Further Thinking

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY(2024)

引用 1|浏览13
暂无评分
摘要
Weakly-supervised fine-grained visual categorization (FGVC) aims to achieve subclass classification within the same large class using only label information. Compared to general images, fine-grained images have similar appearances and features, and are often affected by disturbances such as viewpoint, lighting, and occlusion during data collection, resulting in significant intra-class variance and small inter-class variance. To achieve FGVC, carefully designed models are often needed to explore the locally discriminative regions of the image. This paper revisits high-quality FGVC publications based on deep learning and analyzes from two new perspective: fine-grained image data and backbone. We address two ignored but interesting problems in FGVC. First, we argue that the reasons for exacerbating intra-class variance are not the same in data of animal, plant, and commodity types, and it is necessary to consider the effects of posture, covariate shift, and structural changes. Additionally, the "soft boundary" between subclasses intensifies the difficulty of classification. Second, we highlight that convolutional networks and self-attention networks have different receptive fields and shape biases, leading to performance differences when processing different types of fine-grained data. Overall, our analysis provides new insights into recent advances, challenges, and future directions for FGVC based on deep learning, which can help researchers develop more effective models for FGVC.
更多
查看译文
关键词
Fine-grained visual categorization,deep learning,weakly supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要