Auto-Vocabulary Semantic Segmentation
arxiv(2023)
摘要
Open-ended image understanding tasks gained significant attention from the
research community, particularly with the emergence of Vision-Language Models.
Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic
segmentation without relying on a fixed vocabulary, and in some cases, they
operate without the need for training or fine-tuning. However, OVS methods
typically require users to specify the vocabulary based on the task or dataset
at hand. In this paper, we introduce Auto-Vocabulary Semantic
Segmentation (AVS), advancing open-ended image understanding by eliminating
the necessity to predefine object categories for segmentation. Our approach,
, presents a framework that autonomously identifies relevant class names
using enhanced BLIP embeddings, which are utilized for segmentation afterwards.
Given that open-ended object category predictions cannot be directly compared
with a fixed ground truth, we develop a Large Language Model-based
Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically
generated class names and their corresponding segments. Our method sets new
benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes
for AVS and showcases competitive performance to OVS methods that require
specified class names.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要