ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Abstract:
We propose a knowledge-enhanced approach, ERNIE-ViL, to learn joint representations of vision and language. ERNIE-ViL tries to construct the detailed semantic connections (objects, attributes of objects and relationships between objects in visual scenes) across vision and language, which are essential to vision-language cross-modal task...More
Code:
Data:
Tags
Comments