谷歌浏览器插件
订阅小程序
在清言上使用

Starting Point Selection and Multiple-Standard Matching for Video Object Segmentation with Language Annotation

IEEE transactions on multimedia(2023)

引用 0|浏览6
暂无评分
摘要
In this study, we investigate language-level video object segmentation, where first-frame language annotation is used to describe the target object. Because a language label is typically compatible with all frames in a video, the proposed method can choose the most suitable starting frame to mitigate initialization failure. Apart from extracting the visual feature from a static video frame, a motion-language score based on optical flow is also proposed to describe moving objects more accurately. Scores of multiple standards are then aggregated using an attention-based mechanism to predict the final result. The proposed method is evaluated on four widely-used video object segmentation datasets, including the DAVIS 2017, DAVIS 2016, SegTrack V2 and YouTubeObject datasets, and a novel accuracy measured as mean region similarity is obtained on both the DAVIS 2017 (67.2%) and DAVIS 2016 (83.5%) datasets. The code will be published.
更多
查看译文
关键词
Annotations,Proposals,Visualization,Standards,Image segmentation,Object segmentation,Motion segmentation,Starting point,matching strategy,video object segmentation,language annotation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要