Recognizing manipulation actions in arts and crafts shows using domain-specific visual and textual cues.

ICCV Workshops(2011)

引用 5|浏览101
暂无评分
摘要
We present an approach for automatic annotation of commercial videos from an arts-and-crafts domain with the aid of textual descriptions. The main focus is on recognizing both manipulation actions (e.g. cut, draw, glue) and the tools that are used to perform these actions (e.g. markers, brushes, glue bottle). We demonstrate how multiple visual cues such as motion descriptors, object presence, and hand poses can be combined with the help of contextual priors that are automatically extracted from associated transcripts or online instructions. Using these diverse features and linguistic information we propose several increasingly complex computational models for recognizing elementary manipulation actions and composite activities, as well as their temporal order. The approach is evaluated on a novel dataset of comprised of 27 episodes of PBS Sprout TV, each containing on average 8 manipulation actions.
更多
查看译文
关键词
art,feature extraction,image motion analysis,video retrieval,video signal processing,art,automatic annotation,commercial video,craft,domain-specific textual cue,domain-specific visual cue,hand pose,linguistic information,manipulation action,motion descriptor,multiple visual cues,object presence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要