articleHuggingFace Blog
Vision Language Models (Better, faster, stronger)
Vision Language Models are getting smaller while becoming more capable, with new architectures enabling any-to-any inputs/outputs, multimodal retrieval and agents. The post surveys models like Chameleon/Lumina-mGPT, Qwen 2.5 Omni (Thinker-Talker), MiniCPM-o 2.6, Janus-Pro-7B, and Kimi-VL-A3B-Thinking, plus MoE decoders, RAG, safety, and new benchmarks (MMT-Bench, MMMU-Pro).
publié 12 MAI 2025★★★★★
Lire la sourcehuggingface.co/blog/vlms-2025
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
- Source
- HuggingFace Blog
- Ingéré
- 12 MAI 2025 · 19:10
- Score édito
- 4.0 / 5