Compute-Optimal Training: Scaling Laws Revisited
A fresh look at how to spend a fixed compute budget between model size and training tokens, and why many recent models were undertrained.
Hoffmann et al. · NeurIPS 2025
Topic collections
Filter the archive by discipline, scan compact summaries with citations, and save the papers you want to revisit.
A fresh look at how to spend a fixed compute budget between model size and training tokens, and why many recent models were undertrained.
Hoffmann et al. · NeurIPS 2025
A practical guide to component-level responsiveness: when to use container queries, how to structure containers, and where they outperform viewport breakpoints.
How routing tokens to a small subset of expert networks delivers larger effective capacity without proportional compute cost.
Fedus, Zoph et al. · JMLR 2025
A no-hype framework for adding AI to research, writing, support, and internal workflows without creating a second operating system to maintain.
Replacing the U-Net backbone in diffusion models with a transformer improves scaling and sample quality across image benchmarks.
Peebles, Xie · ICCV 2025
Training assistants to critique and revise their own outputs against a written set of principles, reducing reliance on human preference labels.
Bai et al. · 2025
Checkout speed is not just fewer fields. It is ordering the right questions, reducing uncertainty, and validating at the moment a user can recover.
Learning robust manipulation policies from logged data without risky online exploration on physical hardware.
Kumar et al. · CoRL 2025
A cross-lab survey of how multimodal foundation models are being adapted to output actions for agents and robots, and where they break.
Brohan et al. · 2025 survey