FeedCette semaineArticle
articleGitHub Trending — Python

KellerJordan/modded-nanogpt

Modded-Nanogpt outlines a speedrun to train a 124M NanoGPT to 3.28 loss on FineWeb in under 90 seconds on 8x H100s, beating the llm.c baseline. It compiles dozens of techniques (Muon optimizer, FP8 matmul, zero-init projections, rotary embeddings, and YaRN/windowing) plus practical run instructions. A track 2 for GPT-2 Medium uses a 2.92 target.

publié 30 AVR. 2026★★★★
Lire la sourcegithub.com/KellerJordan/modded-nanogpt
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
Source
GitHub Trending — Python
Ingéré
30 AVR. 2026 · 04:08
Score édito
4.0 / 5