Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

被引用0|引用|浏览24
其它链接arxiv.org
微博一下
We introduce the first RGBT crowd counting benchmark with 2,030 pairs of RGB-thermal images and 138,389 annotated people

摘要

Crowd counting is a fundamental yet challenging problem, which desires rich information to generate pixel-wise crowd density maps. However, most previous methods only utilized the limited information of RGB images and may fail to discover the potential pedestrians in unconstrained environments. In this work, we find that incorporating o...更多

代码

数据

0
简介
  • Crowd counting [17, 9] is a fundamental computer vision task that aims to automatically estimate the number of people in unconstrained scenes.
  • It remains a very challenging problem that desires rich information to generate pixel-wise crowd density maps.
  • Most previous methods only utilized the optical information extracted from RGB images and may fail to accurately recognize the semantic objects in unconstraint scenarios.
  • RGB images cannot guarantee the high-quality density maps.
  • More comprehensive information should be explored for crowd counting
重点内容
  • Crowd counting [17, 9] is a fundamental computer vision task that aims to automatically estimate the number of people in unconstrained scenes
  • To promote further researches of this field, we propose a large-scale benchmark “RGBT Crowd Counting (RGBTCC)”, which contains 2,030 pairs of RGB-thermal images and 138,389 annotated pedestrians
  • To facilitate the multimodal crowd counting, we introduce a cross-modal collaborative representation learning framework, which incorporates multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to fully capture the complementarities among different modalities
  • The proposed RGBT crowd counting framework is composed of three parallel backbones and an Information Aggregation-Distribution Module (IADM)
  • We introduce the first RGBT crowd counting benchmark with 2,030 pairs of RGB-thermal images and 138,389 annotated people
  • We develop a cross-modal collaborative representation learning framework, which utilizes a tailor-designed Information Aggregation-Distribution Module to fully capture the complementary information of different modalities
方法
  • The authors propose a cross-modal collaborative representation learning framework for multimodal crowd counting.
  • The first class is the specially-designed models for crowd counting, including MCNN [62], SANet [2], CSRNet [20], and BL [32]
  • These methods are reimplemented to take the concatenation of RGB and thermal images as input in an “Early Fusion” way.
  • The second class is several best-performing models for multimodal learning, including UCNet [57], HDFNet [35], and BBSNet [6]
  • Based on their official codes, these methods are reimplemented to estimate crowd counts on the RGBT-CC dataset.
  • The authors' IADM can be incorporated into various networks, here the authors take CSRNet, MCNN, SANet, and BL as backbone to develop multiple instances of the framework
结果
  • The authors' CSRNet+IADM has a relative performance improvement of 17.3% on RMSE, compared with the thermal-based CSRNet. 33.01 31.94 30.91 31.48 using the full IADM, the method achieves the best performance on all evaluation metrics.
  • All instances of the method outperform the corresponding backbone networks consistently.
  • Both MCNN+IADM and SANet+IADM have a relative performance improvement of 18.9% on RMSE, compared with their “Early Fusion” models
结论
  • The authors propose to incorporate optical and thermal information to estimate crowd counts in unconstrained scenarios.
  • To this end, the authors introduce the first RGBT crowd counting benchmark with 2,030 pairs of RGB-thermal images and 138,389 annotated people.
  • Backbone Input Feature Learning GAME(0) ↓ GAME(1) ↓ GAME(2) ↓ GAME(3) ↓ RMSE ↓ RGB D
总结
  • Introduction:

    Crowd counting [17, 9] is a fundamental computer vision task that aims to automatically estimate the number of people in unconstrained scenes.
  • It remains a very challenging problem that desires rich information to generate pixel-wise crowd density maps.
  • Most previous methods only utilized the optical information extracted from RGB images and may fail to accurately recognize the semantic objects in unconstraint scenarios.
  • RGB images cannot guarantee the high-quality density maps.
  • More comprehensive information should be explored for crowd counting
  • Methods:

    The authors propose a cross-modal collaborative representation learning framework for multimodal crowd counting.
  • The first class is the specially-designed models for crowd counting, including MCNN [62], SANet [2], CSRNet [20], and BL [32]
  • These methods are reimplemented to take the concatenation of RGB and thermal images as input in an “Early Fusion” way.
  • The second class is several best-performing models for multimodal learning, including UCNet [57], HDFNet [35], and BBSNet [6]
  • Based on their official codes, these methods are reimplemented to estimate crowd counts on the RGBT-CC dataset.
  • The authors' IADM can be incorporated into various networks, here the authors take CSRNet, MCNN, SANet, and BL as backbone to develop multiple instances of the framework
  • Results:

    The authors' CSRNet+IADM has a relative performance improvement of 17.3% on RMSE, compared with the thermal-based CSRNet. 33.01 31.94 30.91 31.48 using the full IADM, the method achieves the best performance on all evaluation metrics.
  • All instances of the method outperform the corresponding backbone networks consistently.
  • Both MCNN+IADM and SANet+IADM have a relative performance improvement of 18.9% on RMSE, compared with their “Early Fusion” models
  • Conclusion:

    The authors propose to incorporate optical and thermal information to estimate crowd counts in unconstrained scenarios.
  • To this end, the authors introduce the first RGBT crowd counting benchmark with 2,030 pairs of RGB-thermal images and 138,389 annotated people.
  • Backbone Input Feature Learning GAME(0) ↓ GAME(1) ↓ GAME(2) ↓ GAME(3) ↓ RMSE ↓ RGB D
表格
  • Table1: The training, validation and testing sets of our RGBT-CC benchmark. In each grid, the first value is the number of images, while the second value denotes the average count per image
  • Table2: The performance of different inputs and different representation learning approaches on our RGBT-CC benchmark
  • Table3: The performance under different illumination conditions on our RGBT-CC benchmark. The unimodal data is directly fed into CSRNet, while the multimodal data is fed into our proposed framework based on CSRNet. “↓” denotes lower is better
  • Table4: Performance of different methods on the proposed RGBT-CC benchmark. All the methods in this table utilize both RGB images and thermal images to estimate the crowd counts
  • Table5: Performance of different level numbers of the pyramid pooling layer in IADM
  • Table6: Performance of different methods on the ShanghaiTechRGBD benchmark. All the methods in this table utilize both RGB images and depth images to estimate the crowd counts
  • Table7: Performance of unimodal data and multimodal data on the RGBT-CC benchmark
  • Table8: Performance of unimodal data and multimodal data on the ShanghaiTechRGBD benchmark
Download tables as Excel
相关工作
  • Crowd Counting Benchmarks: In recent years, we have witnessed the rapid evolution of crowd counting benchmarks. UCSD [3] and WorldExpo [56] are two early datasets that respectively contain 2,000 and 3,980 video frames with low diversities and low-medium densities. To alleviate the limitations of the aforementioned datasets, Zhang et al [62] collected 1,198 images with 330,165 annotated heads, which are of better diversity in terms of scenes and density levels. Subsequently, three large-scale datasets were proposed in succession. For instance, UCF-QNRF [14] is composed of 1,535 high density images images with a total of 1.25 million pedestrians. JHU-CROWD++ [44]
基金
  • Our CSRNet+IADM has a relative performance improvement of 17.3% on RMSE, compared with the thermal-based CSRNet
  • 33.01 31.94 30.91 31.48 using the full IADM, our method achieves the best performance on all evaluation metrics
  • As can be observed, all instances of our method outperform the corresponding backbone networks consistently. Both MCNN+IADM and SANet+IADM have a relative performance improvement of 18.9% on RMSE, compared with their “Early Fusion” models
研究对象与分析
pairs: 2030
In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to fully capture the complementary information of different modalities

pairs: 2030
To the best of our knowledge, no attempts have been made to simultaneously explore RGB and thermal images for estimating the crowd counts. In this work, to promote further researches of this field, we propose a large-scale benchmark “RGBT Crowd Counting (RGBTCC)”, which contains 2,030 pairs of RGB-thermal images and 138,389 annotated pedestrians. Moreover, our benchmark makes significant advances in terms of diversity and difficulty, as these RGBT images were captured from unconstrained scenes (e.g., malls, streets, train stations, etc.) with various illumination (e.g., day and night)

pedestrians: 138389
In summary, the major contributions of this work are three-fold:. • We introduce a large-scale RGBT benchmark to promote the research in the field of crowd counting, in which 138,389 pedestrians are annotated in 2,030 pairs of RGB-thermal images captured in unconstrained scenarios. • We develop a cross-modal collaborative representation learning framework, which is capable of fully learning the complementarities among different modalities with our tailor-designed Information AggregationDistribution Module

large-scale datasets: 3
To alleviate the limitations of the aforementioned datasets, Zhang et al [62] collected 1,198 images with 330,165 annotated heads, which are of better diversity in terms of scenes and density levels. Subsequently, three large-scale datasets were proposed in succession. For instance, UCF-QNRF [14] is composed of 1,535 high density images images with a total of 1.25 million pedestrians

pairs: 2030
On the basis of coordinate mapping relation, we crop the corresponding RGB regions and resize them to 640×480. We then choose 2,030 pairs of representative RGB-thermal images for manual annotations. Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness

pairs: 1013
We then choose 2,030 pairs of representative RGB-thermal images for manual annotations. Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image

pedestrians: 138389
Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image. The detailed distribution of people is shown in Fig. 2

pairs: 1030
Finally, the proposed RGBT-CC benchmark is randomly divided into three parts. As shown in Table 1, 1030 pairs are used for training, 200 pairs are for validation and 800 pairs are for testing. In this work, we propose a cross-modal collaborative representation learning framework for multimodal crowd counting

pairs: 2030
In this work, we propose to incorporate optical and thermal information to estimate crowd counts in unconstrained scenarios. To this end, we introduce the first RGBT crowd counting benchmark with 2,030 pairs of RGB-thermal images and 138,389 annotated people. Moreover, we develop a cross-modal collaborative representation learning framework, which utilizes a tailor-designed Information Aggregation-Distribution Module to fully capture the complementary information of different modalities

引用论文
  • Shuai Bai, Zhiqun He, Yu Qiao, Hanzhe Hu, Wei Wu, and Junjie Yan. Adaptive dilated network with self-correction supervision for counting. In CVPR, pages 4594–4603, 2020. 1
    Google ScholarLocate open access versionFindings
  • Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. Scale aggregation network for accurate and efficient crowd counting. In ECCV, pages 734–750, 2018. 3, 4, 6, 8, 9
    Google ScholarLocate open access versionFindings
  • Antoni B Chan, Zhang-Sheng John Liang, and Nuno Vasconcelos. Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR, pages 1–7. IEEE, 2008. 2
    Google ScholarLocate open access versionFindings
  • A. B. Chan and N. Vasconcelos. Bayesian poisson regression for crowd counting. In ICCV, pages 545–551, Sept 2009. 3
    Google ScholarLocate open access versionFindings
  • Ke Chen, Chen Change Loy, Shaogang Gong, and Tony Xiang. Feature mining for localised crowd counting. In BMVC, volume 1, page 3, 2012. 3
    Google ScholarLocate open access versionFindings
  • Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In ECCV, 2020. 3, 7, 8
    Google ScholarLocate open access versionFindings
  • Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. Jldcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In CVPR, pages 3052–3062, 2020. 3
    Google ScholarLocate open access versionFindings
  • Min Fu, Pei Xu, Xudong Li, Qihe Liu, Mao Ye, and Ce Zhu. Fast crowd density estimation with convolutional neural networks. EAAI, 43:81–88, 2015. 3
    Google ScholarLocate open access versionFindings
  • Guangshuai Gao, Junyu Gao, Qingjie Liu, Qi Wang, and Yunhong Wang. Cnn-based density estimation and crowd counting: A survey. arXiv preprint arXiv:2003.12783, 2020. 1
    Findings
  • Isha Ghodgaonkar, Subhankar Chakraborty, Vishnu Banna, Shane Allcroft, Mohammed Metwaly, Fischer Bordwell, Kohsuke Kimura, Xinxin Zhao, Abhinav Goel, Caleb Tung, et al. Analyzing worldwide social distancing through largescale computer vision. arXiv preprint arXiv:2008.12363, 2020. 1
    Findings
  • Ricardo Guerrero-Gomez-Olmedo, Beatriz Torre-Jimenez, Roberto Lopez-Sastre, Saturnino Maldonado-Bascon, and Daniel Onoro-Rubio. Extremely overlapping vehicle counting. In Iberian Conference on Pattern Recognition and Image Analysis, pages 423–431. Springer, 2015. 6
    Google ScholarLocate open access versionFindings
  • Yuhang He, Zhiheng Ma, Xing Wei, Xiaopeng Hong, Wei Ke, and Yihong Gong. Error-aware density isomorphism reconstruction for unsupervised cross-domain crowd counting. AAAI, 2021. 1
    Google ScholarLocate open access versionFindings
  • Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. Multi-source multi-scale counting in extremely dense crowd images. In CVPR, pages 2547–2554, 203
    Google ScholarLocate open access versionFindings
  • Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. Composition loss for counting, density map estimation and localization in dense crowds. In ECCV, 2018. 2, 3, 8
    Google ScholarLocate open access versionFindings
  • Bo Jiang, Zitai Zhou, Xiao Wang, and Jin Tang. cmsalgan: Rgb-d salient object detection with cross-view generative adversarial networks. TMM, 2020. 2
    Google ScholarLocate open access versionFindings
  • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. Crowd counting and density estimation by trellis encoderdecoder networks. In CVPR, pages 6133–6142, 2019. 3
    Google ScholarLocate open access versionFindings
  • Di Kang, Zheng Ma, and Antoni B Chan. Beyond counting: comparisons of density maps for crowd analysis tasks—counting, detection, and tracking. CSVT, 29(5):1408–1422, 2018. 1
    Google ScholarLocate open access versionFindings
  • Douwe Kiela and Leon Bottou. Learning image embeddings using convolutional neural networks for improved multimodal semantics. In EMNLP, pages 36–45, 2014. 3
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. 6
    Findings
  • Yuhong Li, Xiaofan Zhang, and Deming Chen. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In CVPR, pages 1091–1100, 2018. 1, 3, 4, 6, 8, 9
    Google ScholarLocate open access versionFindings
  • Dongze Lian, Jing Li, Jia Zheng, Weixin Luo, and Shenghua Gao. Density map regression guided detection network for rgb-d crowd counting and localization. In CVPR, pages 1821–1830, 2019. 1, 2, 3, 4, 8, 9
    Google ScholarLocate open access versionFindings
  • Jiang Liu, Chenqiang Gao, Deyu Meng, and Alexander G Hauptmann. Decidenet: Counting varying density crowds through attention guided detection and density estimation. In CVPR, pages 5197–5206, 2018. 3, 8
    Google ScholarLocate open access versionFindings
  • Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, and Liang Lin. Efficient crowd counting via structured knowledge transfer. In ACM MM, 2020. 6
    Google ScholarLocate open access versionFindings
  • Liang Liu, Hao Lu, Hongwei Zou, Haipeng Xiong, Zhiguo Cao, and Chunhua Shen. Weighing counts: Sequential crowd counting by reinforcement learning. arXiv preprint arXiv:2007.08260, 2020. 6
    Findings
  • Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. Crowd counting with deep structured scale integration network. In ICCV, pages 1774–1783, 2019. 1, 3
    Google ScholarLocate open access versionFindings
  • Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, and Liang Lin. Crowd counting using deep recurrent spatialaware network. In IJCAI, 2018. 1
    Google ScholarLocate open access versionFindings
  • Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. Dynamic spatial-temporal representation learning for traffic flow prediction. TITS, 2020. 1
    Google ScholarLocate open access versionFindings
  • Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In CVPR, pages 3225–3234, 2019. 3
    Google ScholarLocate open access versionFindings
  • Weizhe Liu, Mathieu Salzmann, and Pascal Fua. Contextaware crowd counting. In CVPR, pages 5099–5108, 2019. 1
    Google ScholarLocate open access versionFindings
  • Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, and Yinjie Lei. Semi-supervised crowd counting via self-training on surrogate tasks. In ECCV, 2020. 1
    Google ScholarFindings
  • Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. Cross-modality person re-identification with shared-specific feature transfer. In CVPR, pages 13379– 13389, 2020. 3
    Google ScholarLocate open access versionFindings
  • Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. Bayesian loss for crowd count estimation with point supervision. In ICCV, pages 6142–6151, 2019. 1, 3, 4, 6, 8, 9
    Google ScholarLocate open access versionFindings
  • Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. Learning scales from points: A scale-aware probabilistic model for crowd counting. In ACM MM, pages 220–228, 2020. 1
    Google ScholarLocate open access versionFindings
  • Zhiheng Ma, Xing Wei, Xiaopeng Hong, Hui Lin, Yunfeng Qiu, and Yihong Gong. Learning to reason: Leveraging neural networks for approximate dnf counting. AAAI, 2021. 1
    Google ScholarLocate open access versionFindings
  • Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. Hierarchical dynamic filtering network for rgb-d salient object detection. In ECCV, 2020. 3, 7, 8
    Google ScholarLocate open access versionFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NIPS, pages 8026–8037, 2019. 6
    Google ScholarLocate open access versionFindings
  • Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. Depth-induced multi-scale recurrent attention network for saliency detection. In ICCV, pages 7254–7263, 2019. 2
    Google ScholarLocate open access versionFindings
  • Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. A2dele: Adaptive and attentive depth distiller for efficient rgb-d salient object detection. In CVPR, pages 9060–9069, 2020. 3
    Google ScholarLocate open access versionFindings
  • Zhilin Qiu, Lingbo Liu, Guanbin Li, Qing Wang, Nong Xiao, and Liang Lin. Crowd counting via multi-view scale aggregation networks. In ICME, pages 1498–1503. IEEE, 2019. 3
    Google ScholarLocate open access versionFindings
  • Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. Switching convolutional neural network for crowd counting. In CVPR, volume 1, page 6, 2017. 3
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 4
    Findings
  • Vishwanath A Sindagi and Vishal M Patel. Generating highquality crowd density maps using contextual pyramid cnns. In ICCV, pages 1879–1888. IEEE, 2017. 1, 3
    Google ScholarLocate open access versionFindings
  • Vishwanath A Sindagi and Vishal M Patel. Multi-level bottom-top and top-bottom feature fusion for crowd counting. In ICCV, pages 1002–1012, 2019. 6
    Google ScholarLocate open access versionFindings
  • Vishwanath A Sindagi, Rajeev Yasarla, and Vishal M Patel. Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. TPAMI, 2020. 2
    Google ScholarLocate open access versionFindings
  • Tao Sun, Zonglin Di, Pengyu Che, Chun Liu, and Yin Wang. Leveraging crowdsourced gps data for road extraction from aerial imagery. In CVPR, pages 7509–7518, 2019. 2, 3
    Google ScholarLocate open access versionFindings
  • Thirumalaisamy P Velavan and Christian G Meyer. The covid-19 epidemic. Tropical medicine & international health, 25(3):278, 2020. 1
    Google ScholarLocate open access versionFindings
  • Elad Walach and Lior Wolf. Learning to count with cnn boosting. In ECCV, pages 660–676. Springer, 2016. 3
    Google ScholarLocate open access versionFindings
  • Chuan Wang, Hua Zhang, Liang Yang, Si Liu, and Xiaochun Cao. Deep people counting in extremely dense crowds. In ACM MM, pages 1299–1302, 2015. 3
    Google ScholarLocate open access versionFindings
  • Qi Wang, Junyu Gao, Wei Lin, and Xuelong Li. Nwpucrowd: A large-scale benchmark for crowd counting and localization. TPAMI, 2020. 3
    Google ScholarLocate open access versionFindings
  • Hao Wu, Hanyuan Zhang, Xinyu Zhang, Weiwei Sun, Baihua Zheng, and Yuning Jiang. Deepdualmapper: A gated fusion network for automatic map extraction using aerial images and trajectories, 2020. 3
    Google ScholarFindings
  • Feng Xiong, Xingjian Shi, and Dit-Yan Yeung. Spatiotemporal modeling for crowd counting in videos. In ICCV, pages 5151–5159, 2017. 1
    Google ScholarLocate open access versionFindings
  • Lixian Yuan, Zhilin Qiu, Lingbo Liu, Hefeng Wu, Tianshui Chen, Pei Chen, and Liang Lin. Crowd counting via scale-communicative aggregation networks. Neurocomputing, 409:420–430, 2020. 3
    Google ScholarLocate open access versionFindings
  • Yingjie Zhai, Deng-Ping Fan, Jufeng Yang, Ali Borji, Ling Shao, Junwei Han, and Liang Wang. Bifurcated backbone strategy for rgb-d salient object detection. arXiv e-prints, pages arXiv–2007, 2020. 2
    Google ScholarFindings
  • Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. Relational attention network for crowd counting. In ICCV, pages 6788–6797, 2019. 3
    Google ScholarLocate open access versionFindings
  • Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. Attentional neural fields for crowd counting. In ICCV, pages 5714–5723, 2019. 1
    Google ScholarLocate open access versionFindings
  • Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. Cross-scene crowd counting via deep convolutional neural networks. In CVPR, pages 833–841, 2015. 2, 3
    Google ScholarLocate open access versionFindings
  • Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. Uc-net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In CVPR, pages 8582–8591, 2020. 3, 7, 8
    Google ScholarLocate open access versionFindings
  • Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He. Is faster r-cnn doing well for pedestrian detection? In ECCV, pages 443–457. Springer, 2016. 1
    Google ScholarLocate open access versionFindings
  • Shuo Zhang, Youfang Lin, and Hao Sheng. Residual networks for light field image super-resolution. In CVPR, pages 11046–11055, 2019. 5
    Google ScholarLocate open access versionFindings
  • Shanghang Zhang, Guanhang Wu, Joao P Costeira, and Jose MF Moura. Understanding traffic density from largescale web camera data. In CVPR, pages 5898–5907, 2017. 1
    Google ScholarLocate open access versionFindings
  • Shizhou Zhang, Yifei Yang, Peng Wang, Xiuwei Zhang, and Yanning Zhang. Attend to the difference: Cross-modality person re-identification via contrastive correlation. arXiv preprint arXiv:1910.11656, 2019. 3
    Findings
  • Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. Single-image crowd counting via multi-column convolutional neural network. In CVPR, pages 589–597, 2016. 1, 2, 3, 4, 6, 8, 9
    Google ScholarLocate open access versionFindings
  • He Zhao and Richard P Wildes. Spatiotemporal feature residual propagation for action prediction. In ICCV, pages 7003–7012, 2019. 5
    Google ScholarLocate open access versionFindings
  • Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. Contrast prior and fluid pyramid integration for rgbd salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3927–3936, 2019. 3
    Google ScholarLocate open access versionFindings
  • Desen Zhou and Qian He. Cascaded multi-task learning of head segmentation and density regression for rgbd crowd counting. IEEE Access, 2020. 2
    Google ScholarLocate open access versionFindings
下载 PDF 全文
您的评分 :
0

 

标签
评论