Scaling Laws in Deep Learning

Overview#

Scaling laws describe how model performance changes with compute, data, and parameters.

Key Papers#

${p(s)}$

${\sigma}$

  • Kaplan et al. (2020): Scaling Laws for Neural Language Models
  • Hoffmann et al. (2022): Chinchilla - Training Compute-Optimal LLMs

Observations#

  • Loss scales as power law with model size, data size, and compute
  • Optimal allocation of compute between model size and data