Scaling Laws in Deep Learning

Overview#

Scaling laws describe how model performance changes with compute, data, and parameters.

Key Papers#

${p(s)}$

${\sigma}$

Kaplan et al. (2020): Scaling Laws for Neural Language Models
Hoffmann et al. (2022): Chinchilla - Training Compute-Optimal LLMs

Observations#

Loss scales as power law with model size, data size, and compute
Optimal allocation of compute between model size and data