Panda: A Gigapixel-Level Human-Centric Video Dataset

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)(2020)

引用 71|浏览623
暂无评分
摘要
We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (similar to 1 km(2) area) and high-resolution details (similar to gigapixel-level/frame). The scenes may contain 4k head counts with over 100x scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a 'global-to-local zoom-in' framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.
更多
查看译文
关键词
gigapixel-level human-centric video dataset,bounding boxes,artificial intelligence,local interactions,global trajectories,global-to-local zoom-in framework,pedestrian pose,fine-grained attribute labels,PANDA Website,human behaviors,interaction-aware group detection,tracking tasks,human detection,hierarchical ground-truth annotations,enriched ground-truth annotations,real-world scenes,gigapixel camera,multiobject visual analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要