articleHuggingFace Blog

Vision Language Model Alignment in TRL

TRL expands Vision Language Model alignment with Mixed Preference Optimization (MPO), Group Relative Policy Optimization (GRPO), and Group Sequence Policy Optimization (GSPO), extending beyond pairwise DPO to richer signals. It also adds Reinforce Leave One Out (RLOO) and Online DPO for scalable multimodal alignment, plus native SFT support and reproducible training scripts.

published AUG 07, 2025★★★★★

Read the sourcehuggingface.co/blog/trl-vlm-alignment

[*] Opens in a new tab · no tracking on Lantern's side

Source: HuggingFace Blog
Ingested: AUG 07, 2025 · 19:10
Editorial score: 4.0 / 5