FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
arxiv(2023)
摘要
Despite the massive success of fine-tuning Pre-trained Language Models
(PLMs), they remain susceptible to out-of-distribution input. Dataset
cartography is a simple yet effective dual-model approach that improves the
robustness of fine-tuned PLMs. It involves fine-tuning a model on the original
training set (i.e. reference model), selecting a subset of important training
instances based on the training dynamics, and fine-tuning again only on these
selected examples (i.e. main model). However, this approach requires
fine-tuning the same model twice, which is computationally expensive for large
PLMs. In this paper, we show that (1) training dynamics are highly transferable
across model sizes and pre-training methods, and that (2) fine-tuning main
models using these selected training instances achieves higher training
efficiency than empirical risk minimization (ERM). Building on these
observations, we propose a novel fine-tuning approach: Fine-Tuning by
transFerring Training dynamics (FTFT). Compared with dataset cartography, FTFT
uses more efficient reference models and aggressive early stopping. FTFT
achieves robustness improvements over ERM while lowering the training cost by
up to ∼ 50%.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要