Progressive prediction: Video anomaly detection via multi-grained prediction
IET IMAGE PROCESSING(2024)
Abstract
Video Anomaly Detection (VAD) has been an active research field for several decades. However, most existing approaches merely extract a single type of feature from videos and define a single paradigm to indicate the extent of abnormalities. A coarse-to-fine three-level prediction is built by integrating different levels of spatio-temporal representations, better highlighting the difference between normal and abnormal behaviors. First, an object-level trajectory prediction is proposed to model human historical position using a graph transformer network. Subsequently, skeleton-level prediction is achieved by incorporating the positional information from the trajectory prediction. More importantly, based on the predicted skeleton, a skeleton-guided pixel-level region prediction is performed. A novel Skeleton Conditioned Generative Adversarial Network (SCGAN) is designed to explore the correlation between skeleton-level and pixel-level motion prediction. Benefiting from SCGAN, the prediction of human regions is contributed by both coarse-grained and fine-grained motion features. This three-level prediction, namely Progressive Prediction Video Anomaly Detection (P3VAD), enlarges the prediction error on irregular motion patterns. Besides, a pixel-level analysis method is proposed to achieve Background-bias Elimination (BE) and denoise the predicted region. Experimental results validate the effectiveness of P3VAD on the four benchmark datasets (ShanghaiTech, CUHK Avenue, IITB-Corridor, and ADOC). This three-level prediction, namely Progressive Prediction Video Anomaly Detection (P3VAD), enlarges the prediction error on irregular motion patterns. This is the first effort to progressively combine three-level predictions from coarse to fine-grained for VAD. We demonstrate the effectiveness of our framework by conducting an extensive experimental evaluation on the four publicly large-scale benchmark datasets in both micro-AUC and macro-AUC metrics. image
MoreTranslated text
Key words
computer vision,unsupervised learning,video signal processing,video surveillance
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined