VALHALLA: Visual Hallucination for Machine Translation Supplemental Material

Yi Li,Rameswar Panda,Yoon Kim,Chun-Fu Chen,Rogerio Feris,David Cox,Nuno Vasconcelos

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)（2022）

引用 11|浏览8

暂无评分

摘要

We evaluate the performance of our proposed approach (VALHALLA) using three machine translation datasets, namely Multi30K [3], Wikipedia Image Text (WIT) [14] and WMT2014 [1]. These datasets present a diversity of challenges in machine translation: Multi30K requires models to learn to aggregate vision-language information from a relatively small number of training samples, while WIT and WMT contains translation tasks with different data scales. WMT additionally focuses on translating news articles, which may not be as readily grounded through visual data (compared to Multi30K and WIT), and thus presents an especially challenging test bed for MMT systems. Below we provide more details on each of the dataset.

查看译文

关键词

Vision + language

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要