VinVL: Making Visual Representations Matter in Vision-Language Models

Pengchuan Zhang
Pengchuan Zhang
Xiaowei Hu
Xiaowei Hu
Jianwei Yang
Jianwei Yang
Lijuan Wang
Lijuan Wang
Cited by: 0|Bibtex|Views16
Other Links: arxiv.org

Abstract:

This paper presents a detailed study of improving visual representations for vision language (VL) tasks and develops an improved object detection model to provide object-centric representations of images. Compared to the most widely used \emph{bottom-up and top-down} model \cite{anderson2018bottom}, the new model is bigger, better-desig...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments