SpotFormer: A Transformer-based Framework for Precise Soccer Action Spotting
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP)(2022)
摘要
Action spotting and classification consist in detecting the exact moments at which events occur in long videos. The current mainstream spotting practices generally use a two-stage pipeline that performs feature collection and integration, then salient action detection and postprocessing. Following that, we present SpotFormer, a simple yet effective framework, capable of precise action spotting. Specifically, we employ several most advanced backbone networks as auxiliary feature extractors, and reduce feature dimensionality in a straightforward and efficient way. The frame-wise features are fed into a transformer-based spotting network devised to leverage spatiotemporal information. We obtain 0.609 tight mAP score via model ensemble and achieve the state-of-the-art performance on the SoccerNet-v2 dataset.
更多查看译文
关键词
video understanding,action spotting,action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要