VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
CoRR(2023)
摘要
In recent years, online Video Instance Segmentation (VIS) methods have shown
remarkable advancement with their powerful query-based detectors. Utilizing the
output queries of the detector at the frame level, these methods achieve high
accuracy on challenging benchmarks. However, we observe the heavy reliance of
these methods on the location information that leads to incorrect matching when
positional cues are insufficient for resolving ambiguities. Addressing this
issue, we present VISAGE that enhances instance association by explicitly
leveraging appearance information. Our method involves a generation of queries
that embed appearances from backbone feature maps, which in turn get used in
our suggested simple tracker for robust associations. Finally, enabling
accurate matching in complex scenarios by resolving the issue of over-reliance
on location information, we achieve competitive performance on multiple VIS
benchmarks. For instance, on YTVIS19 and YTVIS21, our method achieves 54.5 AP
and 50.8 AP. Furthermore, to highlight appearance-awareness not fully addressed
by existing benchmarks, we generate a synthetic dataset where our method
outperforms others significantly by leveraging the appearance cue. Code will be
made available at https://github.com/KimHanjung/VISAGE.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要