CKT-RCM: Clip-Based Knowledge Transfer and Relational Context Mining for Unbiased Panoptic Scene Graph Generation

Nanhao Liang, Yong Liu, Wenfang Sun,Yingwei Xia, Fan Wang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览2

暂无评分

摘要

Panoptic Scene Graph (PSG) generation aims to generate a scene graph representing pairwise relationship between objects within an image. Its use of pixel-wise segmentation mask and inclusion of background regions in relationship inference make it quickly become a popular approach. However, it has an intrinsic challenge that the trained relationship predictors are either of low value or of low quality due to the long-tail distribution of typical datasets. Inspired by how humans use prior knowledge to greatly simplify this problem, we bring in two novel designs, using a pre-trained vision-language model to correct the data skewness, and using conditional prior distribution on contexts to further refine the prediction quality. Specifically, the approach named CKT-RCM first exploits relation-associated visual features from the image encoder and constructs a relation classifier by extracting text embeddings for all relationships from the text encoder of the vision-language model. It also utilizes rich relational context from subject-object pairs to facilitate informative relation predictions via a cross-attention mechanism. We conduct comprehensive experiments on the OpenPSG dataset and achieve state-of-the-art performance.

查看译文

关键词

Scene graph generation,panoptic segmentation,visual-linguistic knowledge,attention mechanism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要