UC: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training—-Supplement Material

semanticscholar(2021)

引用 62|浏览0
暂无评分
摘要
Multilingual Image-Text Retrieval During fine-tuning, we train and evaluate the pre-trained UC on Multi30K [4, 3, 1] and MSCOCO [2, 8, 6]. When we fine-tune UC on both datasets, we use batch size of 40 and sample 2 negative image-text pairs for each sampled positive image-text pair. The pre-trained model is optimized by the Adam Optimizer with the learning rate set to 1e− 4 and a linear warm-up for the first 10% of fine-tuning. For Cross-Lingual zero-shot setting, the pre-trained UC is fine-tuned on English-only training data for 30K steps. For All-Language setting, we train UC on all the training data in all languages for 50K steps. The finetuning is run on 8 Nvidia V100 GPUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要