Scaling Laws for Autoregressive Generative Modeling
Abstract:
We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-l...More
Code:
Data:
Full Text
Tags
Comments