F2S-Net: learning frame-to-segment prediction for online action detection

Journal of Real-Time Image Processing(2024)

引用 0|浏览2
暂无评分
摘要
Online action detection (OAD) aims at predicting action per frame from a streaming untrimmed video in real time. Most existing approaches leverage all the historical frames in the sliding window as the temporal context of the current frame since single-frame prediction is often unreliable. However, such a manner inevitably introduces useless even noisy video content, which often misleads action classifier when recognizing the ongoing action in the current frame. To alleviate this difficulty, we propose a concise and novel F2S-Net, which can adaptively discover the contextual segments in the online sliding window, and convert current frame prediction into relevant-segment prediction. More specifically, as the current frame can be either action or background, we develop F2S-Net with a distinct two-branch structure, i.e., the action (or background) branch can exploit the action (or background) segments. Via multi-level action supervision, these two branches can complementarily enhance each other, allowing to identify the contextual segments in the sliding window to robustly predict what is ongoing. We evaluate our approach on popular OAD benchmarks, i.e., THUMOS-14, TVSeries and HDD. The extensive results show that our F2S-Net outperforms the recent state-of-the-art approaches.
更多
查看译文
关键词
Online action detection,Contextual segment prediction,Multi-level action supervision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要