Domain Adaptation for Large-Vocabulary Object Detectors
arxiv(2024)
摘要
Large-vocabulary object detectors (LVDs) aim to detect objects of many
categories, which learn super objectness features and can locate objects
accurately while applied to various downstream data. However, LVDs often
struggle in recognizing the located objects due to domain discrepancy in data
distribution and object vocabulary. At the other end, recent vision-language
foundation models such as CLIP demonstrate superior open-vocabulary recognition
capability. This paper presents KGD, a Knowledge Graph Distillation technique
that exploits the implicit knowledge graphs (KG) in CLIP for effectively
adapting LVDs to various downstream domains. KGD consists of two consecutive
stages: 1) KG extraction that employs CLIP to encode downstream domain data as
nodes and their feature distances as edges, constructing KG that inherits the
rich semantic relations in CLIP explicitly; and 2) KG encapsulation that
transfers the extracted KG into LVDs to enable accurate cross-domain object
classification. In addition, KGD can extract both visual and textual KG
independently, providing complementary vision and language knowledge for object
localization and object classification in detection tasks over various
downstream domains. Experiments over multiple widely adopted detection
benchmarks show that KGD outperforms the state-of-the-art consistently by large
margins.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要