Part-Stacked CNN for Fine-Grained Visual Categorization

Shaoli Huang,Zhe Xu,Dacheng Tao,Ya Zhang

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)（2015）

引用 493|浏览77

暂无评分

摘要

In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the fine-grained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that en- codes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from both the perspective of classification accuracy and model interpretability.

查看译文

关键词

fine-grained visual categorization,part-stacked CNN,human-understandable visual manuals,manually-labeled annotations,fully convolutional network,multiple object parts location,two-stream classification network,object-level cues,part-level cues,classification accuracy,model interpretability

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要