Recognizing manipulation actions in arts and crafts shows using domain-specific visual and textual cues.

Benjamin Sapp,Rizwan Chaudhry,Xiaodong Yu,Gautam Singh,Ian Perera,Francis Ferraro,Evelyne Tzoukermann,Jana Kosecka,Jan Neumann

ICCV Workshops（2011）

引用 5|浏览101

暂无评分

摘要

We present an approach for automatic annotation of commercial videos from an arts-and-crafts domain with the aid of textual descriptions. The main focus is on recognizing both manipulation actions (e.g. cut, draw, glue) and the tools that are used to perform these actions (e.g. markers, brushes, glue bottle). We demonstrate how multiple visual cues such as motion descriptors, object presence, and hand poses can be combined with the help of contextual priors that are automatically extracted from associated transcripts or online instructions. Using these diverse features and linguistic information we propose several increasingly complex computational models for recognizing elementary manipulation actions and composite activities, as well as their temporal order. The approach is evaluated on a novel dataset of comprised of 27 episodes of PBS Sprout TV, each containing on average 8 manipulation actions.

查看译文

关键词

art,feature extraction,image motion analysis,video retrieval,video signal processing,art,automatic annotation,commercial video,craft,domain-specific textual cue,domain-specific visual cue,hand pose,linguistic information,manipulation action,motion descriptor,multiple visual cues,object presence

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要