Augmented Spatial Context Fusion Network for Scene Graph Generation.

IJCNN(2023)

引用 0|浏览2
暂无评分
摘要
Scene graph generation provides high-order semantic information by understanding the objects and their relations in images. In order to improve the performance of scene graph generation, context fusion has been widely used in scene graph generation tasks, LSTM and Vision-Transformer are commonly used fusion modules. Both LSTM and Vision-Transformer realize context fusion by stacking multiple basic units, which needs to learn a large number of parameters of the units. However, the model computational efficiency of scene graph generation as a mid-level semantic understanding task to support downstream tasks is crucial. To simplify the context fusion computation, this paper proposes ASCF-Net (Augmented Spatial Context Fusion Network) which computes the spatial context of designated object by searching the nearest neighbor objects with high relevance and strengthens the context with random noise. Without learning parameters, the above computational process essentially simulates the attention mechanism. Experiments on VG dataset show that ASCF-Net uses 15.26% of the parameters of Bi-LSTM and 13.34% of the parameters of Vision-Transformer for context fusion based on the same baseline and achieves higher performance than using the two fusion modules. At the same time, ASCF-Net uses simple fusion module to obtain competitive results on VG dataset compared with the mainstream scene generation models.
更多
查看译文
关键词
Scene graph generation,Lightweight contextual module,Spatial nearest neighbor fusion,VG dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要