VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun,Jiahui Chen,Shan Zhang,Xinyu Zhang,Qiang Chen,Gang Zhang,Errui Ding,Jingdong Wang,Zechao Li

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)（2024）

引用 0|浏览66

暂无评分

摘要

In this paper, we propose a novel Visual Reference Prompt (VRP) encoder thatempowers the Segment Anything Model (SAM) to utilize annotated reference imagesas prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAMcan utilize annotated reference images to comprehend specific objects andperform segmentation of specific objects in target image. It is note that theVRP encoder can support a variety of annotation formats for reference images,including point, box, scribble, and mask.VRP-SAM achieves a breakthrough within the SAM framework by extending itsversatility and applicability while preserving SAM's inherent strengths, thusenhancing user-friendliness. To enhance the generalization ability of VRP-SAM,the VRP encoder adopts a meta-learning strategy. To validate the effectivenessof VRP-SAM, we conducted extensive empirical studies on the Pascal and COCOdatasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visualreference segmentation with minimal learnable parameters. Furthermore, VRP-SAMdemonstrates strong generalization capabilities, allowing it to performsegmentation of unseen objects and enabling cross-domain segmentation.

查看译文

关键词

Visual Reference,Visual Prompts,Image Object,Target Image,Reference Image,Learnable Parameters,Generalization Capability,Segmentation Performance,Object Segmentation,Scribble,COCO Dataset,Annotation Format,Semantic,Training Set,Image Features,Bounding Box,Target Object,Segmentation Results,Segmentation Task,Random Initialization,Image Encoder,Dice Loss,Foundation Model,Binary Cross Entropy,Binary Cross-entropy Loss,Vision Transformer,Style Image,Self-attention Layer,Base Classes

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要