LLMs/Deep Dive/01/14/2026/8 min read

Compute-Optimal Training: Scaling Laws Revisited

A fresh look at how to spend a fixed compute budget between model size and training tokens, and why many recent models were undertrained.

Hoffmann et al. · DeepMind · NeurIPS 2025

Mara Chen

Editor, ML researcher

Compute-Optimal Training: Scaling Laws Revisited

For years the default instinct was simple: make the model bigger. This paper argues that for a fixed compute budget, parameters and training tokens should scale together in roughly equal measure, and that many headline models were trained on far too little data for their size.

The core finding

By fitting loss curves across hundreds of training runs, the authors estimate an optimal ratio between model parameters and tokens. The takeaway: a smaller model trained on more data can outperform a larger, undertrained one at the same cost.

Practical signal: before scaling parameters, check whether you are token-starved. Doubling data is often cheaper than doubling width.

Why it matters

The result reframes efficiency. Inference cost scales with parameters, so a compute-optimal smaller model is also cheaper to serve, a rare win-win for labs and product teams alike.

Citation

Hoffmann, J. et al. (2025). Training Compute-Optimal Large Language Models. arXiv:2203.15556.

Source paper

Keep reading

Related papers

More LLMs

Web

Comments

Add a practical note, implementation detail, or question. Comments are saved for editorial review.

No approved comments are visible yet. Start the discussion below.

Compute-Optimal Training: Scaling Laws Revisited

The core finding

Why it matters

Citation

Related papers

CSS Container Queries Explained

Sparse Mixture-of-Experts at Inference Scale

Practical AI Tools for Small Teams

Comments

Compute-Optimal Training: Scaling Laws Revisited

The core finding

Why it matters

Citation

Related papers

CSS Container Queries Explained

Sparse Mixture-of-Experts at Inference Scale

Practical AI Tools for Small Teams

Comments

The papers that matter, summarized weekly.