Difference-Masking: Choosing What to Mask in Continued Pretraining.

Alex Wilf,Syeda Nahida Akter,Leena Mathur,Paul Pu Liang, Sheryl Mathew, Mengrou Shou,Eric Nyberg,Louis-Philippe Morency

CoRR（2023）

引用 0|浏览37

暂无评分

摘要

Self-supervised learning (SSL) and the objective of masking-and-predicting in particular have led to promising SSL performance on a variety of downstream tasks. However, while most approaches randomly mask tokens, there is strong intuition from the field of education that deciding what to mask can substantially improve learning outcomes. We introduce Difference-Masking, an approach that automatically chooses what to mask during continued pretraining by considering what makes an unlabelled target domain different from the pretraining domain. Empirically, we find that Difference-Masking outperforms baselines on continued pretraining settings across four diverse language and multimodal video tasks. The cross-task applicability of Difference-Masking supports the effectiveness of our framework for SSL pretraining in language, vision, and other domains.

查看译文

关键词

continued pretraining,difference-masking

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要