articleHuggingFace Blog
Vision Language Model Alignment in TRL
TRL expands Vision Language Model alignment with Mixed Preference Optimization (MPO), Group Relative Policy Optimization (GRPO), and Group Sequence Policy Optimization (GSPO), extending beyond pairwise DPO to richer signals. It also adds Reinforce Leave One Out (RLOO) and Online DPO for scalable multimodal alignment, plus native SFT support and reproducible training scripts.
published AUG 07, 2025★★★★★
Read the sourcehuggingface.co/blog/trl-vlm-alignment
[*] Opens in a new tab · no tracking on Lantern's side
- Source
- HuggingFace Blog
- Ingested
- AUG 07, 2025 · 19:10
- Editorial score
- 4.0 / 5