When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

ACM Transactions on Multimedia Computing, Communications, and Applications(2022)

引用 1|浏览25
暂无评分
摘要
AbstractWe consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.
更多
查看译文
关键词
Action temporal localization, action duration, vlogs, natural language processing, video processing, multimodal processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要