FeedCette semaineArticle
articleHuggingFace Blog

Liger GRPO meets TRL

Report on debugging Liger GRPO loss with DeepSpeed ZeRO-3 using Qwen/Qwen2.5-0.5B-Instruct in bf16, highlighting a shape mismatch during training. The traceback traces through grpo_loss and fused_linear_ppo, pointing to a forward pass issue in the Liger kernel. No fix is shown, but it identifies the code paths to inspect (grpo_loss, fused_linear_ppo).

publié 25 MAI 2025★★★★★
Lire la sourcehuggingface.co/blog/liger-grpo
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
Source
HuggingFace Blog
Ingéré
25 MAI 2025 · 19:10
Score édito
3.0 / 5