Abductive Ego-View Accident Video Understanding for Safe Driving Perception
CVPR 2024(2024)
摘要
We present MM-AU, a novel dataset for Multi-Modal Accident video
Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each
with temporally aligned text descriptions. We annotate over 2.23 million object
boxes and 58,650 pairs of video-based accident reasons, covering 58 accident
categories. MM-AU supports various accident understanding tasks, particularly
multimodal video diffusion to understand accident cause-effect chains for safe
driving. With MM-AU, we present an Abductive accident Video understanding
framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video
diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven
by an abductive CLIP model. This model involves a contrastive interaction loss
to learn the pair co-occurrence of normal, near-accident, accident frames with
the corresponding text descriptions, such as accident reasons, prevention
advice, and accident categories. OAVD enforces the causal region learning while
fixing the content of the original frame background in video generation, to
find the dominant cause-effect chain for certain accidents. Extensive
experiments verify the abductive ability of AdVersa-SD and the superiority of
OAVD against the state-of-the-art diffusion models. Additionally, we provide
careful benchmark evaluations for object detection and accident reason
answering since AdVersa-SD relies on precise object and accident reason
information.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要