Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
CoRR(2023)
摘要
In recent years, the state-of-the-art in deep learning has been dominated by
very large models that have been pre-trained on vast amounts of data. The
paradigm is very simple: investing more computational resources (optimally)
leads to better performance, and even predictably so; neural scaling laws have
been derived that accurately forecast the performance of a network for a
desired level of compute. This leads to the notion of a `compute-optimal'
model, i.e. a model that allocates a given level of compute during training
optimally to maximize performance. In this work, we extend the concept of
optimality by allowing for an `adaptive' model, i.e. a model that can change
its shape during training. By doing so, we can design adaptive models that
optimally traverse between the underlying scaling laws and outpace their
`static' counterparts, leading to a significant reduction in the required
compute to reach a given target performance. We show that our approach
generalizes across modalities and different shape parameters.
更多查看译文
关键词
vision transformer,scaling laws,adaptive strategies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要