MILA - Multi-Task Learning from Videos via Efficient Inter-Frame Attention.

Donghyun Kim,Tian Lan,Chuhang Zou,Ning Xu,Bryan A. Plummer,Stan Sclaroff,Jayan Eledath,Gérard G. Medioni

arxiv（2021）

引用 3|浏览34

暂无评分

摘要

Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slow network runs on sparsely sampled keyframes and the fast shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast net-work to learn similar features to well align keyframes and non-keyframes. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. MILA obatins competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90% of FLOPs.

查看译文

关键词

task-specific attention,nonkey frames,adversarial learning strategy,multitask learning benchmarks,MILA,videos,interframe local attention,slow-fast architecture,sparsely sampled keyframes,fast shallow network,feature learning,low-latency multitask learning,floating point operation,FLOP,attention based feature propagation method,ILA

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要