Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
CoRR(2023)
摘要
Data augmentation via back-translation is common when pretraining
Vision-and-Language Navigation (VLN) models, even though the generated
instructions are noisy. But: does that noise matter? We find that nonsensical
or irrelevant language instructions during pretraining can have little effect
on downstream performance for both HAMT and VLN-BERT on R2R, and is still
better than only using clean, human data. To underscore these results, we
concoct an efficient augmentation method, Unigram + Object, which generates
nonsensical instructions that nonetheless improve downstream performance. Our
findings suggest that what matters for VLN R2R pretraining is the quantity of
visual trajectories, not the quality of instructions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要