Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024(2023)
摘要
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video
dataset and benchmark challenge. Ego-Exo4D centers around
simultaneously-captured egocentric and exocentric video of skilled human
activities (e.g., sports, music, dance, bike repair). More than 800
participants from 13 cities worldwide performed these activities in 131
different natural scene contexts, yielding long-form captures from 1 to 42
minutes each and 1,422 hours of video combined. The multimodal nature of the
dataset is unprecedented: the video is accompanied by multichannel audio, eye
gaze, 3D point clouds, camera poses, IMU, and multiple paired language
descriptions -- including a novel "expert commentary" done by coaches and
teachers and tailored to the skilled-activity domain. To push the frontier of
first-person video understanding of skilled human activity, we also present a
suite of benchmark tasks and their annotations, including fine-grained activity
understanding, proficiency estimation, cross-view translation, and 3D hand/body
pose. All resources will be open sourced to fuel new research in the community.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要