FeedThis weekArticle
articleHuggingFace Blog

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

AutoRound is Intel's weight-only post-training quantization that uses signed gradient descent to jointly optimize weight rounding and clipping for accurate low-bit quantization (INT2–INT8). It claims up to 2.1x higher relative accuracy at 2-bit, quantizes a 72B model in ~37 minutes on A100, and supports mixed-bit tuning, multiple export formats, and recipes (auto-round-best/light) with small calibration sets.

published APR 29, 2025★★★★
Read the sourcehuggingface.co/blog/autoround
[*] Opens in a new tab · no tracking on Lantern's side
Source
HuggingFace Blog
Ingested
APR 29, 2025 · 19:10
Editorial score
4.0 / 5