Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
Granite 4.0 3B Vision is a compact vision-language model built for enterprise document understanding, combining language and vision in a modular design. It introduces ChartNet, a large multimodal chart dataset, and DeepStack architecture for layered visual feature injection to improve table, chart, and key-value extraction. The model ships as a LoRA adapter, enabling text-only fallbacks and integration with Docling.