articleHuggingFace Blog

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

TRL now runs vLLM co-located with training, sharing GPUs to eliminate idle time and HTTP overhead. By embedding vLLM in the same process group, training and inference take turns on the same devices, with torchrun compatibility, TP/DP support, and GRPO-enabled workflows. The article covers design, implementation notes, and benchmarks (1.5B, 7B, 72B) plus a train_grpo_colocate.py script to try.

publié 03 JUIN 2025★★★★★

Lire la sourcehuggingface.co/blog/vllm-colocate

[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern

Source: HuggingFace Blog
Ingéré: 03 JUIN 2025 · 19:10
Score édito: 4.0 / 5