Cascaded Boundary Network for High-Quality Temporal Action Proposal Generation

IEEE Transactions on Circuits and Systems for Video Technology(2020)

引用 8|浏览43
暂无评分
摘要
Creating high-quality temporal action proposals is fundamental yet challenging for accurate action detection in untrimmed videos due to the complexity of the background and variation in actions’ durations and magnitudes. In this paper, we propose a cascaded boundary network (CBN) to predict the action boundaries by considering the importance of precise boundary information to develop accurate action proposals. Specifically, the first stage of CBN locates the temporal boundaries by predicting the probability that each frame corresponds to an action, the start position, and the end position. A temporal convolutional network is used in this stage to capture short-term context information. Next, the predicted probabilities are forwarded to the second stage, in which a long short-term memory (LSTM) network is utilized for further refinement by exploiting the correlation between the predicted probabilities to capture long-term context information. Finally, we combine the results from both stages to produce a long- and short-term information fusion. The experiments on THUMOS14 and ActivityNet-1.3 show that CBN achieves state-of-the-art recall performance. The performance improvement is especially remarkable for a small average number (AN) of retrieved proposals; e.g., the average recall at AN=50 on THUMOS14 is improved from 37.46% to 43.06%. Further experiments are performed by introducing proposals generated by CBN into an existing action detection framework. CBN also achieves state-of-the-art average mAP@tIoU on the THUMOS14 detection benchmark.
更多
查看译文
关键词
Proposals,Videos,Feature extraction,Task analysis,Object detection,Visualization,Correlation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要