Welcome EmbeddingGemma, Google's new efficient embedding model
Google introduces EmbeddingGemma, a 308M-parameter multilingual embedding model optimized for on-device use with a 2K context window. It supports 100+ languages and achieves top scores on the MMTEB/MTEB benchmarks while staying under 200 MB RAM when quantized. The model uses a Gemma3-based encoder with mean pooling and an optional 768-d output that can be truncated to 512/256/128 for speed and memory efficiency, enabling fast on-device retrieval and RAG pipelines.