Mimicking Very Efficient Network For Object Detection

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)(2017)

引用 338|浏览57
暂无评分
摘要
Current CNN based object detectors need initialization from pre-trained ImageNet classification models, which are usually time-consuming. In this paper, we present a fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models. We add supervision from high-level features of the large networks in training to help the small network better learn object representation. More specifically, we conduct a mimic method for the features sampled from the entire feature map and use a transform layer to map features from the small network onto the same dimension of the large network. In training the small network, we optimize the similarity between features sampled from the same region on the feature maps of both networks. Extensive experiments are conducted on pedestrian and common object detection tasks using VGG, Inception and ResNet. On both Caltech and Pascal VOC, we show that the modified 2.5x accelerated Inception network achieves competitive performance as the full Inception Network. Our faster model runs at 80 FPS for a 1000x1500 large input with only a minor degradation of performance on Caltech.
更多
查看译文
关键词
Pascal VOC,Caltech,ResNet,Inception,VGG,object representation,CNN based detectors,convolutional feature mimic framework,ImageNet classification,object detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要