ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Fei Yu
Fei Yu
Jiji Tang
Jiji Tang
Weichong Yin
Weichong Yin
Yu Sun
Yu Sun
Hao Tian
Hao Tian
Cited by: 0|Bibtex|Views44
Other Links: arxiv.org

Abstract:

We propose a knowledge-enhanced approach, ERNIE-ViL, to learn joint representations of vision and language. ERNIE-ViL tries to construct the detailed semantic connections (objects, attributes of objects and relationships between objects in visual scenes) across vision and language, which are essential to vision-language cross-modal task...More

Code:

Data:

Your rating :
0

 

Tags
Comments