articleHuggingFace Blog

Vision Language Model Alignment in TRL

TRL expands Vision Language Model alignment with Mixed Preference Optimization (MPO), Group Relative Policy Optimization (GRPO), and Group Sequence Policy Optimization (GSPO), extending beyond pairwise DPO to richer signals. It also adds Reinforce Leave One Out (RLOO) and Online DPO for scalable multimodal alignment, plus native SFT support and reproducible training scripts.

publié 07 AOÛT 2025★★★★★

Lire la sourcehuggingface.co/blog/trl-vlm-alignment

[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern

Source: HuggingFace Blog
Ingéré: 07 AOÛT 2025 · 19:10
Score édito: 4.0 / 5