Trajectory-pooled Spatial-temporal Architecture of Deep Convolutional Neural Networks for Video Event Detection

Yonggang Li,Rui Ge,Yi Ji,Shengrong Gong,Chunping Liu

IEEE Transactions on Circuits and Systems for Video Technology（2019）

引用 9|浏览15

暂无评分

摘要

Nowadays Content-based video event detection faces great challenges due to complex scenes and blurred actions in surveillance videos. To alleviate these challenges, we propose a novel spatial-temporal architecture of deep Convolutional Neural Networks for this task. By taking advantage of spatial-temporal information, we fine-tune two-stream networks, and then fuse spatial and temporal features at convolution layers using a 2D pooling fusion method to enforce the consistence of spatial-temporal information. Based on the two-stream networks and spatial-temporal layer, a triple-channel model is obtained. Furthermore, we implement trajectory-constrained pooling to deep features and hand-crafted features to combine their merits. A fusion method on triple-channel yields the final detection result. The experiments on two benchmark surveillance video datasets including VIRAT 1.0 and VIRAT 2.0, which involve a suit of challenging events, such as person loading an object to a vehicle or person opening a vehicle trunk, manifest that the proposed method can achieve superior performance compared with the state-of-the-art methods on these event benchmarks.

查看译文

关键词

Trajectory,Event detection,Feature extraction,Computer vision,Computer architecture,Image motion analysis,Fuses

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要