Large-Scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
European Conference on Computer Vision(2020)
摘要
Prior work in visual dialog has focused on training deep neural models on VisDial in isolation. Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. We adapt the recently proposed ViLBERT model for multi-turn visually-grounded conversations. Our model is pretrained on the Conceptual Captions and Visual Question Answering datasets, and finetuned on VisDial. Our best single model outperforms prior published work by \(1\%\) absolute on NDCG and MRR.
更多查看译文
关键词
visual dialog,large-scale,state-of-the-art
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络