Adaptive Scenario Discovery for Crowd Counting.

ICASSP, Volume abs/1812.02393, 2019.

Cited by: 33|Views47
EI
Weibo:
Our proposed framework achieves state-of-the-art performance on two popular crowd counting datasets

Abstract:

Crowd counting, i.e., estimation number of pedestrian in crowd images, is emerging as an important research problem with the public security applications. A key ingredient in the design of crowd counting systems is the construction of counting models while being robust to various scenarios under facts such as camera perspective and physic...More

Code:

Data:

0
ZH
Full Text
Bibtex
Weibo
Introduction
  • Counting is the process of estimating the number of a particular object. With the expansion of urban population and the convenience of modern transportation, it is common to have large crowds in specific events or scenarios, and crowd counting from images or videos becomes crucial for applications ranging from traffic control to public safety.
Highlights
  • Counting is the process of estimating the number of a particular object
  • With the expansion of urban population and the convenience of modern transportation, it is common to have large crowds in specific events or scenarios, and crowd counting from images or videos becomes crucial for applications ranging from traffic control to public safety
  • We present an adaptive scenario discovery framework for crowd counting
  • On Part B, our ASD framework achieves the best mean absolute error (MAE) 8.5 and mean squared error (MSE) 13.7 compared to the state-of-the-art
  • Similar to the experiments on ShanghaiTech, the ASD framework shows better results than the other approaches, and we improve on the previously reported state-of-the-art results by 26.3% for the MAE metric and 31.8% for the MSE, which indicates the low variance of our prediction across the high crowd density images
  • Our proposed framework achieves state-of-the-art performance on two popular crowd counting datasets
Methods
  • Part A Part B UCF CC 50.
  • MAE MSE MAE MSE MAE MSE MCNN [5] DAN [21] CP-CNN [7].
  • Huang et al [22] - ACSCP [12] DecideNet [23] SaCNN [11] CSRNet [8]
Results
  • Results and Comparison

    The authors first evaluate the overall results of the proposed framework.
  • The authors compare the framework with several state-of-theart approaches, including the multi-column CNN with different receptive fields [5], the Switching-CNN that leverages variation of crowd density [6], and a very recent dilated convolution based model CSRNet [8].
  • On Part A of ShanghaiTech, the authors achieve a significant overall improvement of 24.8 of absolute MAE value over Switching-CNN [6] and 2.6 of MAE over the state-of-the-art CSRNet [8].
  • On Part B, the ASD framework achieves the best MAE 8.5 and MSE 13.7 compared to the state-of-the-art.
Conclusion
  • The authors have presented a novel architecture for highdensity population counting.
  • The authors' approach focuses on the implicit discovery and dynamic modeling of scenarios.
  • The authors have reformulated the crowd counting problem as a scenario classification problem such that the semantic scenario models into a combined prediction sub-tasks.
  • The adaptive scenario discovery is built to obtain two weights of different sizes through the parallel perception path for dynamic fusion.
  • The authors' proposed framework achieves state-of-the-art performance on two popular crowd counting datasets
Summary
  • Introduction:

    Counting is the process of estimating the number of a particular object. With the expansion of urban population and the convenience of modern transportation, it is common to have large crowds in specific events or scenarios, and crowd counting from images or videos becomes crucial for applications ranging from traffic control to public safety.
  • Methods:

    Part A Part B UCF CC 50.
  • MAE MSE MAE MSE MAE MSE MCNN [5] DAN [21] CP-CNN [7].
  • Huang et al [22] - ACSCP [12] DecideNet [23] SaCNN [11] CSRNet [8]
  • Results:

    Results and Comparison

    The authors first evaluate the overall results of the proposed framework.
  • The authors compare the framework with several state-of-theart approaches, including the multi-column CNN with different receptive fields [5], the Switching-CNN that leverages variation of crowd density [6], and a very recent dilated convolution based model CSRNet [8].
  • On Part A of ShanghaiTech, the authors achieve a significant overall improvement of 24.8 of absolute MAE value over Switching-CNN [6] and 2.6 of MAE over the state-of-the-art CSRNet [8].
  • On Part B, the ASD framework achieves the best MAE 8.5 and MSE 13.7 compared to the state-of-the-art.
  • Conclusion:

    The authors have presented a novel architecture for highdensity population counting.
  • The authors' approach focuses on the implicit discovery and dynamic modeling of scenarios.
  • The authors have reformulated the crowd counting problem as a scenario classification problem such that the semantic scenario models into a combined prediction sub-tasks.
  • The adaptive scenario discovery is built to obtain two weights of different sizes through the parallel perception path for dynamic fusion.
  • The authors' proposed framework achieves state-of-the-art performance on two popular crowd counting datasets
Tables
  • Table1: Comparison with the state-of-the-arts on the benchmarks. Part A and Part B indicate ShanghaiTech Part A and
Download tables as Excel
Related work
  • Numerous efforts have been devoted to the design of crowd counting models. Detail survey of the recent progress can be found in [10]. In this section, we mainly discuss literature on the models with multiple branches representation, which are more related to this work. In [5], Zhang et al proposed the MCNN by using three columns of convolutional neural networks with filters of different sizes. Sam et al [6] proposed the Switching-CNN, which decoupled the three columns into separate CNN (each trained with a subset of the patches), and a density selector is designed to utilize the structural and functional differences. Several works have studied the context information of the crowd images under multiple branch setting. For instance, Sindagi et al [7] applied local and global context coding to population count density estimation, and Zhang et al [11] proposed a scale-adaptive CNN architecture with a backbone of fixed small receptive fields. Another work related to ours is the CSRNet [8], where convolutional neural networks with dilation operations were employed after the backbone of the pre-trained deep model.
Funding
  • Our system is able to represent highly variable crowd images and achieves state-of-the-art results in two challenging benchmarks
  • Similar to the experiments on ShanghaiTech, the ASD framework shows better results than the other approaches, and we improve on the previously reported state-of-the-art results by 26.3% for the MAE metric and 31.8% for the MSE, which indicates the low variance of our prediction across the high crowd density images
  • Our proposed framework achieves state-of-the-art performance on two popular crowd counting datasets
Reference
  • M. Wang and X. Wang, “Automatic adaptation of a generic pedestrian detector to a specific traffic scene,” in CVPR, 2011, pp. 3401–3408.
    Google ScholarFindings
  • R. Stewart, M. Andriluka, and A. Y. Ng, “End-to-End people detection in crowded scenes,” in CVPR, 2016, pp. 2325–2333.
    Google ScholarFindings
  • H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multisource multi-scale counting in extremely dense crowd images,” in CVPR, 2013, pp. 2547–2554.
    Google ScholarFindings
  • D. Onoro Rubio and R. J. Lopez-Sastre, “Towards perspective-free object counting with deep learning,” in ECCV, 2016, pp. 615–629.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Singleimage crowd counting via multi-column convolutional neural network,” in CVPR, 2016, pp. 589–597.
    Google ScholarFindings
  • D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in CVPR, 2017, pp. 5744–5752.
    Google ScholarFindings
  • V. A. Sindagi and V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid cnns,” in ICCV, 2017, pp. 1879–1888.
    Google ScholarFindings
  • Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in CVPR, 2018, pp. 1091–1100.
    Google ScholarFindings
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • V. A Sindagi and V. M Patel, “A survey of recent advances in cnn-based single image crowd counting and density estimation,” Pattern Recognition Letters, vol. 107, pp. 3–16, 2018.
    Google ScholarLocate open access versionFindings
  • L. Zhang, M. Shi, and Q. Chen, “Crowd counting via scale-adaptive convolutional neural network,” in WACV, 2018, pp. 1113–1121.
    Google ScholarFindings
  • Z. Shen, Y. Xu, B. Ni, M. Wang, J. Hu, and X. Yang, “Crowd counting via adversarial cross-scale consistency pursuit,” in CVPR, 2018, pp. 5245–5254.
    Google ScholarFindings
  • Z. Shi, L. Zhang, Y. Liu, X. Cao, Y. Ye, M. Cheng, and G. Zheng, “Crowd counting with deep negative correlation learning,” in CVPR, 2018, pp. 5382–5390.
    Google ScholarFindings
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
    Google ScholarLocate open access versionFindings
  • J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in CVPR, 2018, pp. 7132–7141.
    Google ScholarFindings
  • M. A. Stricker and M. Orengo, “Similarity of color images,” in Storage and Retrieval for Image and Video Databases III, 1995, vol. 2420, pp. 381–393.
    Google ScholarLocate open access versionFindings
  • D. G Lowe, “Distinctive image features from scaleinvariant keypoints,” vol. 60, no. 2, pp. 91–110, 2004.
    Google ScholarFindings
  • N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR, 2005, vol. 1, pp. 886–893.
    Google ScholarLocate open access versionFindings
  • C. Zhang, H. Li, X. Wang, and X. Yang, “Crossscene crowd counting via deep convolutional neural networks,” in CVPR, 2015, pp. 833–841.
    Google ScholarFindings
  • V. A. Sindagi and V. M. Patel, “Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting,” in AVSS, 2017, pp. 1–6.
    Google ScholarFindings
  • L. Wang, W. Shao, Y. Lu, H. Ye, J. Pu, and Y. Zheng, “Crowd counting with density adaption networks,” arXiv preprint arXiv:1806.10040, 2018.
    Findings
  • S. Huang, X. Li, Z. Zhang, F. Wu, S. Gao, R. Ji, and J. Han, “Body structure aware deep crowd counting,” IEEE Transactions on Image Processing, vol. 27, no. 3, pp. 1049–1059, 2018.
    Google ScholarLocate open access versionFindings
  • J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “Decidenet: Counting varying density crowds through attention guided detection and density estimation,” in CVPR, 2018, pp. 5197–5206.
    Google ScholarFindings
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS Workshop, 2017.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments