UC: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training—-Supplement Material

Mingyang Zhou,Luowei Zhou,Shuohang Wang,Yu Cheng,Linjie Li,Zhou Yu,Jingjing Liu

semanticscholar（2021）

引用 62|浏览0

暂无评分

摘要

Multilingual Image-Text Retrieval During fine-tuning, we train and evaluate the pre-trained UC on Multi30K [4, 3, 1] and MSCOCO [2, 8, 6]. When we fine-tune UC on both datasets, we use batch size of 40 and sample 2 negative image-text pairs for each sampled positive image-text pair. The pre-trained model is optimized by the Adam Optimizer with the learning rate set to 1e− 4 and a linear warm-up for the first 10% of fine-tuning. For Cross-Lingual zero-shot setting, the pre-trained UC is fine-tuned on English-only training data for 30K steps. For All-Language setting, we train UC on all the training data in all languages for 50K steps. The finetuning is run on 8 Nvidia V100 GPUs.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要