articleHuggingFace Blog
Vision Language Model Alignment in TRL
TRL expands Vision Language Model alignment with Mixed Preference Optimization (MPO), Group Relative Policy Optimization (GRPO), and Group Sequence Policy Optimization (GSPO), extending beyond pairwise DPO to richer signals. It also adds Reinforce Leave One Out (RLOO) and Online DPO for scalable multimodal alignment, plus native SFT support and reproducible training scripts.
publié 07 AOÛT 2025★★★★★
Lire la sourcehuggingface.co/blog/trl-vlm-alignment
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
- Source
- HuggingFace Blog
- Ingéré
- 07 AOÛT 2025 · 19:10
- Score édito
- 4.0 / 5