Egocentric Action Anticipation by Disentangling Encoding and Inference

2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2019)

引用 6|浏览30
暂无评分
摘要
Egocentric action anticipation consists in predicting future actions from videos collected by means of a wearable camera. Action anticipation methods should be able to continuously 1) summarize the past and 2) predict possible future actions. We observe that action anticipation benefits from explicitly disentangling the two tasks. To this aim, we introduce a learning architecture which makes use of a "rolling" LSTM to continuously summarize the past and an "unrolling" LSTM to anticipate future actions at multiple temporal scales. The model includes a spatial and a temporal branch which process RGB images and optical flow fields independently. The predictions performed by the two branches are fused using a novel modality attention mechanism which leverages the complementary nature of the modalities. Experiments on the EPIC-KITCHENS dataset show that the proposed method surpasses the state-of-the-art by +4.02% and +6.39% when considering Top-1 and Top-5 accuracy respectively. Please see the project webpage at http://iplab.dmi.unict.it/rulstm/.
更多
查看译文
关键词
EPIC-KITCHENS, First Person Vision, Egocentric Vision, Action Anticipation, LSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要