Overview#
Scaling laws describe how model performance changes with compute, data, and parameters.
Key Papers#
${p(s)}$
${\sigma}$
- Kaplan et al. (2020): Scaling Laws for Neural Language Models
- Hoffmann et al. (2022): Chinchilla - Training Compute-Optimal LLMs
Observations#
- Loss scales as power law with model size, data size, and compute
- Optimal allocation of compute between model size and data