AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To overcome the challenge of training binary decision variables for representing discrete sampling locations, Gumbel-Softmax is introduced to our sampling module

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

european conference on computer vision, pp.531-548, (2020)

Cited by: 5|Views136
Full Text
Bibtex
Weibo

Abstract

In the feature maps of CNNs, there commonly exists considerable spatial redundancy that leads to much repetitive processing. Towards reducing this superfluous computation, we propose to compute features only at sparsely sampled locations, which are probabilistically chosen according to activation responses, and then densely reconstruct th...More

Code:

Data:

0
Introduction
  • On many computer vision tasks, significant improvements in accuracy have been achieved through increasing model capacity in convolutional neural networks (CNNs) [12,32].
  • A common approach to this problem is to prune weights and neurons that are not needed to maintain the networks performance [19,11,10,34,14,20,27,38]
  • Orthogonal to these architectural changes are methods that eliminate computation at inference time conditioned on the input.
  • As illustrated in Fig. 1(b), this approach deterministically samples predicted foreground areas while avoiding computational expenditure on the background
Highlights
  • On many computer vision tasks, significant improvements in accuracy have been achieved through increasing model capacity in convolutional neural networks (CNNs) [12,32]
  • A common approach to this problem is to prune weights and neurons that are not needed to maintain the networks performance [19,11,10,34,14,20,27,38]. Orthogonal to these architectural changes are methods that eliminate computation at inference time conditioned on the input. These techniques are typically based on feature map sparsity, where the locations of zero-valued activations are predicted so that the computation at those positions can be skipped [7,30,1]
  • We present a stochastic sampling and interpolation scheme to avoid expensive computation at spatial locations that can be effectively interpolated
  • To overcome the challenge of training binary decision variables for representing discrete sampling locations, Gumbel-Softmax is introduced to our sampling module
  • The effectiveness of this approach is verified on a variety of computer vision tasks
Methods
  • (a) Object Detection (b) Semantic Segmentation (c) Image Classification achieves comparable accuracy with less FLOPs. Compared with SFP [13] and FPGM [14], the approach obtains a smaller accuracy drop with similar FLOPs. The authors further remove the interpolation module from the method and fill the features of unsampled points with 0.
  • Results show that removing interpolation does not affect performance on the ImageNet validation set
  • This is inconsistent with object detection and semantic segmentation.
  • In the image classification task, it is not important to reconstruct the features of unsampled points by interpolation
Results
  • Results are shown in

    Fig. 4(b) and present the numerical results in Appendix. For the method and deterministic Gumbel-Softmax, the authors draw curves according to different sparse loss weights {0.3, 0.2, 0.1, 0.05}.
  • Λ is quite large in most cases, which means the effect of the interpolation module is limited
  • This phenomenon is consistent with the experimental results of the “w/o Interp” entry in Table 4, that the results without interpolation are almost identical to the full model, further indicating that interpolation is not important for image classification.
  • The reason is still unclear but the authors suspect that this phenomenon may be related to the receptive field of operators
Conclusion
  • A method for reducing computation in convolutional networks was proposed that exploits the intrinsic sparsity and spatial redundancy in feature maps.
  • The authors present a stochastic sampling and interpolation scheme to avoid expensive computation at spatial locations that can be effectively interpolated.
  • To overcome the challenge of training binary decision variables for representing discrete sampling locations, Gumbel-Softmax is introduced to the sampling module.
  • The effectiveness of this approach is verified on a variety of computer vision tasks
Summary
  • Introduction:

    On many computer vision tasks, significant improvements in accuracy have been achieved through increasing model capacity in convolutional neural networks (CNNs) [12,32].
  • A common approach to this problem is to prune weights and neurons that are not needed to maintain the networks performance [19,11,10,34,14,20,27,38]
  • Orthogonal to these architectural changes are methods that eliminate computation at inference time conditioned on the input.
  • As illustrated in Fig. 1(b), this approach deterministically samples predicted foreground areas while avoiding computational expenditure on the background
  • Methods:

    (a) Object Detection (b) Semantic Segmentation (c) Image Classification achieves comparable accuracy with less FLOPs. Compared with SFP [13] and FPGM [14], the approach obtains a smaller accuracy drop with similar FLOPs. The authors further remove the interpolation module from the method and fill the features of unsampled points with 0.
  • Results show that removing interpolation does not affect performance on the ImageNet validation set
  • This is inconsistent with object detection and semantic segmentation.
  • In the image classification task, it is not important to reconstruct the features of unsampled points by interpolation
  • Results:

    Results are shown in

    Fig. 4(b) and present the numerical results in Appendix. For the method and deterministic Gumbel-Softmax, the authors draw curves according to different sparse loss weights {0.3, 0.2, 0.1, 0.05}.
  • Λ is quite large in most cases, which means the effect of the interpolation module is limited
  • This phenomenon is consistent with the experimental results of the “w/o Interp” entry in Table 4, that the results without interpolation are almost identical to the full model, further indicating that interpolation is not important for image classification.
  • The reason is still unclear but the authors suspect that this phenomenon may be related to the receptive field of operators
  • Conclusion:

    A method for reducing computation in convolutional networks was proposed that exploits the intrinsic sparsity and spatial redundancy in feature maps.
  • The authors present a stochastic sampling and interpolation scheme to avoid expensive computation at spatial locations that can be effectively interpolated.
  • To overcome the challenge of training binary decision variables for representing discrete sampling locations, Gumbel-Softmax is introduced to the sampling module.
  • The effectiveness of this approach is verified on a variety of computer vision tasks
Tables
  • Table1: Comparison of different interpolation kernels on COCO2017 validation
  • Table2: Validation of the interpolation module on COCO2017 validation
  • Table3: Comparison of different grid prior settings on COCO2017 validation s = 9 s = 11 s = 13 w/o Grid Prior
  • Table4: Performance comparison on the ImageNet validation set. All the methods are based on ResNet-34. Our models are trained with a loss weight of 0.01 and 0.015 to achieve accuracy or FLOPs similar to other methods for fair comparison. “w/o Interp” indicates removing the interpolation module and filling the features of unsampled positions with 0
  • Table5: Comparison of theoretical and realistic speedups on E5-2650 and I7-6650U. Baseline model is trained and evaluated on images with a shorter side of 1000 pixels. The CPU run-time is calculated on the COCO2017 validation set
  • Table6: The numerical results of Fig. 4 (a) in the main paper. Experiments are conducted on object detection (COCO2017 validation)
  • Table7: Numerical results of Fig. 4 (b) in the main paper. Experiments are conducted on semantic segmentation (Cityscapes validation)
  • Table8: Evaluation of inference stability on object detection (COCO2017 validation)
  • Table9: Numerical results of Fig. 6. ResNeXt is chosen as the backbone model. Experiments are conducted on object detection (COCO2017 validation)
Download tables as Excel
Related work
  • In this section, we briefly review related approaches for reducing computation in convolutional neural networks. Model pruning A widely investigated approach for improving network efficiency is to remove connections or filters that are unimportant for achieving high performance. The importance of these network elements has been approximated in various ways, including by connection weight magnitudes [11,10], filter norms [20,34,13], and filter redundancy within a layer [14]. To reflect network sensitivity to these elements, importance has also been measured based on their effects on the loss [19,27] or the reconstruction error of the final response layer [38] when removing them. Alternatively, sparsity learning techniques identify what to prune in conjunction with network training, through constraints that zero out some filters [34], cause some filters to become redundant and removable [6], scale some filter or block outputs to zero [22], or sparsify batch normalization scaling factors [25,36]. Model pruning techniques as well as other architecture-based acceleration schemes, such as low-rank factorizations of convolutional filters [17] and knowledge distillation of networks [16], are orthogonal to our approach and could potentially be employed in a complementary manner. Early stopping Rather than prune network elements, early stopping techniques reduce computation by skipping the processing at later stages whenever it is deemed to be unnecessary. In [8], an adaptive number of ResNet layers are skipped within a residual block for unimportant regions in object classification. The skipping mechanism is controlled by halting scores predicted at branches to the output of each residual unit. In [21], a deep model for semantic segmentation is turned into a cascade of sub-models where earlier sub-models handle easy regions and harder cases are progressively fed forward to the next sub-model for further processing. Like our method, these techniques spatially adapt the processing to the input content. However, they process all spatial positions at least to some degree, which limits the achievable computational savings. Activation sparsity The activations of rectified linear units (ReLUs) are commonly sparse. This property has been exploited for network acceleration by excluding the zero values from subsequent convolutions [31,28]. This approach has been extended by estimating the activation sparsity and skipping the computation for predicted insignificant activations. The sparsity has been predicted from prior knowledge of road and sidewalk locations in autonomous driving applications [30], from model-predicted foreground masks at low resolution [30], from a small auxiliary layer that supplements each convolutional layer [7], and from a highly quantized version of the convolutional layer [1]. Our work instead reconstructs activation maps by interpolation from a sparse set of samples selected in a content-aware fashion, thus avoiding computation at locations where features can be easily reconstructed. Moreover, our probabilistic sampling distributes computation among feature map locations with varying levels of predicted activation, providing greater robustness to activation prediction errors. Sparse sampling To reduce processing cost, PerforatedCNNs compute only sparse samples of a convolutional layer’s outputs and interpolate the remaining values [9]. The sampling follows a predefined pattern, and the interpolation is done by nearest neighbors. Our method also takes a sparse sampling and interpolation approach, but in contrast to the input-independent sampling and generic interpolation of PerforatedCNNs, the sampling in our network is adaptively determined from the input such that the sampling density reflects predicted activation values, and the interpolation parameters are learned. As shown later in the experiments, this approach allows for much greater sparsity in the sampling. Gumbel-based selection Random selection based on the Gumbel distribution has been used in making discrete decisions for network acceleration. The Gumbel-Softmax trick was utilized in adaptively choosing network layers to apply on an input image [33] and in selecting channels or layers to skip [15]. In contrast to these techniques which determine computation based on image-level semantics for image classification, our sampling is driven by the spatial organization of features and is geared towards accurately reconstructing positional content. As a result, our method is well-suited to spatial understanding tasks such as object detection and semantic segmentation.
Funding
  • In our experiments, the sparsity of M can be greater than 70% on average
Reference
  • Cao, S., Ma, L., Xiao, W., Zhang, C., Liu, Y., Zhang, L., Nie, L., Yang, Z.: Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization. In: CVPR (2019)
    Google ScholarFindings
  • Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: mmdetection. https://github.com/open-mmlab/mmdetection (2018)
    Findings
  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
    Google ScholarFindings
  • Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: CVPR (2017)
    Google ScholarFindings
  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)
    Google ScholarFindings
  • Ding, X., Ding, G., Guo, Y., Han, J.: Centripetal sgd for pruning very deep convolutional networks with complicated structure. In: CVPR (2019)
    Google ScholarFindings
  • Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: A more complicated network with less inference complexity. In: CVPR (2017)
    Google ScholarFindings
  • Figurnov, M., Collins, M.D., Zhu, Y., Zhang, L., Huang, J., Vetrov, D., Salakhutdinov, R.: Spatially adaptive computation time for residual networks. In: CVPR (2017)
    Google ScholarFindings
  • Figurnov, M., Ibraimova, A., Vetrov, D., Kohli, P.: Perforatedcnns: Acceleration through elimination of redundant convolutions. In: NIPS (2016)
    Google ScholarFindings
  • Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. In: NIPS (2016)
    Google ScholarFindings
  • Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: NIPS (2015)
    Google ScholarFindings
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
    Google ScholarFindings
  • He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018)
    Findings
  • He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: CVPR (2019)
    Google ScholarFindings
  • Herrmann, C., Bowen, R.S., Zabih, R.: An end-to-end approach for speeding up neural network inference. arXiv preprint arXiv:1812.04180v3 (2019)
    Findings
  • Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
    Findings
  • Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014)
    Google ScholarFindings
  • Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. In: ICLR (2017)
    Google ScholarFindings
  • LeCun, Y., Denker, J.S., Sol1a, S.A.: Optimal brain damage. In: NIPS (1989)
    Google ScholarFindings
  • Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: ICLR (2017)
    Google ScholarFindings
  • Li, X., Liu, Z., Luo, P., Loy, C.C., Tang, X.: Not all pixels are equal: Difficultyaware semantic segmentation via deep layer cascade. In: CVPR (2017)
    Google ScholarFindings
  • Lin, S., Ji, R., Yan, C., Zhang, B., Cao, L., Ye, Q., Huang, F., Doermann, D.: Towards optimal structured cnn pruning via generative adversarial learning. In: CVPR (2019)
    Google ScholarFindings
  • Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
    Google ScholarFindings
  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014)
    Google ScholarFindings
  • Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV (2017)
    Google ScholarFindings
  • Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
    Google ScholarFindings
  • Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J.: Importance estimation for neural network pruning. In: CVPR (2019)
    Google ScholarFindings
  • Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., Dally, W.J.: Scnn: An accelerator for compressed-sparse convolutional neural networks. In: International Symposium on Computer Architecture (2017)
    Google ScholarLocate open access versionFindings
  • Peng, C., Xiao, T., Li, Z., Jiang, Y., Zhang, X., Jia, K., Yu, G., Sun, J.: Megdet: A large mini-batch object detector. In: CVPR (2018)
    Google ScholarFindings
  • Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: Sbnet: Sparse blocks network for fast inference. In: CVPR (2018)
    Google ScholarFindings
  • Shi, S., Chu, X.: Speeding up convolutional neural networks by exploiting the sparsity of rectifier units. arXiv preprint arXiv:1704.07724 (2017)
    Findings
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
    Google ScholarFindings
  • Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: ECCV (2017)
    Google ScholarFindings
  • Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: NIPS (2016)
    Google ScholarFindings
  • Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1492–1500 (2017)
    Google ScholarLocate open access versionFindings
  • Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In: ICLR (2018)
    Google ScholarFindings
  • You, A., Li, X., Zhu, Z., Tong, Y.: Torchcv: A pytorch-based framework for deep learning in computer vision. https://github.com/donnyyou/torchcv (2019)
    Findings
  • Yu, R., Li, A., Chen, C., Lai, J., Morariu, V.I., Han, X., Gao, M., Lin, C., Davis, L.S.: NISP: Pruning networks using neuron importance score propagation. In: CVPR (2018)
    Google ScholarFindings
  • Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: CVPR (2019)
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科