FeedThis weekArticle
articleHuggingFace Blog

Welcome EmbeddingGemma, Google's new efficient embedding model

Google introduces EmbeddingGemma, a 308M-parameter multilingual embedding model optimized for on-device use with a 2K context window. It supports 100+ languages and achieves top scores on the MMTEB/MTEB benchmarks while staying under 200 MB RAM when quantized. The model uses a Gemma3-based encoder with mean pooling and an optional 768-d output that can be truncated to 512/256/128 for speed and memory efficiency, enabling fast on-device retrieval and RAG pipelines.

published SEP 04, 2025★★★★★
Read the sourcehuggingface.co/blog/embeddinggemma
[*] Opens in a new tab · no tracking on Lantern's side
Source
HuggingFace Blog
Ingested
SEP 04, 2025 · 19:10
Editorial score
5.0 / 5