Feed·Digest·Sources·About·

[⋯]Loading

© 2026 Lantern·Set in Geist Mono·Sources [52]·Methodology·Privacy

Built solo in Lille, FR·v0.6

Dev & AI feed

The best of dev and AI, scored every day by an agent. Filtered, summarized, ranked. No color, no noise — just the substance.

Issue: No. 139
Date: MAY 19, 2026
Edition: EN · DAILY
Sources: 14 active
Articles: 31 today

§ Feed·Vol. 02·No. 139

Last ingest·08:00 UTC+0·Next·08:00

Filters

Reference PanelA.1

01. Type— 5

02. Period— 3

03. Source— 7

04. Score— min.

0 active

$⌘K

Articles / day31

7-day avg.48

Mon → Sun-83%

Feed · 834 articles

sort byscore·DESC ↓

641JUN 0300:00

articleHuggingFace Blog·last yr.

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

SmolVLA-450M is an open-source, compact Vision-Language-Action model for robotics that runs on consumer hardware. It is pretrained on publicly licensed community data and trainable on a single consumer GPU, with asynchronous inference boosting throughput and strong performance on simulation and real-world tasks.

★★★★★·HuggingFace Blog

642JUN 0300:00

articleHuggingFace Blog·last yr.

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

TRL now runs vLLM co-located with training, sharing GPUs to eliminate idle time and HTTP overhead. By embedding vLLM in the same process group, training and inference take turns on the same devices, with torchrun compatibility, TP/DP support, and GRPO-enabled workflows. The article covers design, implementation notes, and benchmarks (1.5B, 7B, 72B) plus a train_grpo_colocate.py script to try.

★★★★★·HuggingFace Blog

643MAY 2800:00

articleHuggingFace Blog·last yr.

CodeAgents + Structure: A Better Way to Execute Actions

Cet article démontre que forcer CodeAgents à générer pensées et code dans un JSON structuré peut surpasser les approches traditionnelles sur des benchmarks comme SmolBench. Il explique pourquoi la structure améliore l’exécution des actions et propose des conseils d’implémentation et des cas d’usage pratiques pour les développeurs IA.

★★★★★·HuggingFace Blog

644MAY 2500:00

articleHuggingFace Blog·last yr.

Liger GRPO meets TRL

Report on debugging Liger GRPO loss with DeepSpeed ZeRO-3 using Qwen/Qwen2.5-0.5B-Instruct in bf16, highlighting a shape mismatch during training. The traceback traces through grpo_loss and fused_linear_ppo, pointing to a forward pass issue in the Liger kernel. No fix is shown, but it identifies the code paths to inspect (grpo_loss, fused_linear_ppo).

★★★★★·HuggingFace Blog

645MAY 2300:00

articleHuggingFace Blog·last yr.

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

This post shows how to build a tiny Python agent powered by MCP. It explains a simple loop that pulls tools from MCP servers using an extended huggingface_hub client and runs via a CLI demo.

★★★★★·HuggingFace Blog

646MAY 2300:00

articleHuggingFace Blog·last yr.

Dell Enterprise Hub is all you need to build AI on premises

Dell Enterprise Hub now ships a complete suite of on‑prem AI models and ready‑to‑deploy applications (e.g., Llama 4 Maverick, OpenWebUI), optimized for Dell AI Server platforms via Docker/Kubernetes. It adds an Application Catalog and a dell-ai CLI/Python SDK to run everything from development to on‑prem deployment, plus on‑device models for Dell AI PCs powered by Intel/Qualcomm NPUs.

★★★★★·HuggingFace Blog

647MAY 2106:52

articleHuggingFace Blog·last yr.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Falcon-H1 is a family of six open-source LLMs (0.5B–34B) using a hybrid attention + SSM design, delivering faster inference and lower memory with strong cross-task performance. Available as base and instruction-tuned variants, they support 256K context, 18 native languages (with a tokenizer scalable to 100+), and are optimized for edge-to-large deployments under Apache 2.0.

★★★★★·HuggingFace Blog

648MAY 2106:35

articleHuggingFace Blog·last yr.

Falcon-Arabic: A Breakthrough in Arabic Language Models

Falcon-Arabic is a 7B multilingual LLM from Falcon 3, optimized for Arabic with MSA and dialects. It supports a 32k token context, enabling long-context tasks like RAG and in-depth content creation, and reportedly outperforms larger Arabic LLMs of similar size. Built by TII, it adapts an existing multilingual foundation rather than training from scratch, delivering an efficient, open-source option for Arabic AI.

★★★★★·HuggingFace Blog

649MAY 2100:00

articleHuggingFace Blog·last yr.

Exploring Quantization Backends in Diffusers

Cet article explore les backends de quantification dans Diffusers pour des modèles de diffusion lourds comme Flux, en comparant BF16 et des quantisations (4-bit/8-bit). Il détaille les backends (bitsandbytes, GGUF, torchao, Quanto, FP8) et les composants clés (Text Encoders et Transformer), avec des chiffres mémoire et temps d'inférence qui guident le choix pratique.

★★★★★·HuggingFace Blog

650MAY 2100:00

articleHuggingFace Blog·last yr.

nanoVLM: The simplest repository to train your VLM in pure PyTorch

nanoVLM provides a minimal PyTorch toolkit to train a Vision Language Model on a free Colab tier. It fuses a SigLIP-based vision transformer with a Llama 3 language backbone, using a Modality Projection (pixel shuffle + linear) to align image and text embeddings for decoding. The post offers quickstart steps: clone the repo and run train.py, or use the Colab notebook to begin training without local setup.

★★★★★·HuggingFace Blog

651MAY 1900:00

articleHuggingFace Blog·last yr.

Microsoft and Hugging Face expand collaboration

Microsoft et Hugging Face étendent leur collaboration pour déployer facilement des modèles open sur Azure via AI Foundry, avec plus de 10 000 modèles disponibles. Les modèles sont vérifiés pour la sécurité (safetensors, ProtectAI Guardian et JFrog) et déployables en quelques clics via le bouton Deploy, en choisissant VM et paramètres.

★★★★★·HuggingFace Blog

652MAY 1513:13

articleHuggingFace Blog·last yr.

Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models.

Falcon-Edge is a series of universal, fine-tunable LLMs built on BitNet with ternary weights (-1,0,1). It trains end-to-end to yield both non-quantized and quantized variants (bf16, BitNet, pre-quantized) in one pass, available in 1B and 3B sizes with base and instruction-tuned models. Early results on Hugging Face leaderboard v2 show competitive performance for its size, backed by ~1.5T tokens of pre-training and a 1-bit LLM tooling package.

★★★★★·HuggingFace Blog

653MAY 1500:00

articleHuggingFace Blog·last yr.

The Transformers Library: standardizing model definitions

Transformers is evolving toward a standard model-definition library, aiming to be the pivot across frameworks so an architecture supported by transformers is available in the broader ecosystem. It now covers 300+ architectures with day-0 support and tight interoperability with inference engines (vLLM, SGLang, TGI) as well as llama.cpp and MLX, while simplifying model definitions to lower the barrier to contributions.

★★★★★·HuggingFace Blog

654MAY 1400:00

articleHuggingFace Blog·last yr.

Improving Hugging Face Model Access for Kaggle Users

Kaggle and Hugging Face announce an integration that makes Hugging Face models visible directly on Kaggle, with a pre-populated code snippet to load models in Kaggle notebooks. You can authenticate private models with your HF_TOKEN, and consent-gated models require access requests; notebooks referencing HF models will auto-create a Kaggle model page.

★★★★★·HuggingFace Blog

655MAY 1300:00

articleHuggingFace Blog·last yr.

Blazingly fast whisper transcriptions with Inference Endpoints

Hugging Face unveils a blazing-fast OpenAI Whisper deployment on Inference Endpoints, delivering up to 8x speedups using vLLM and CUDA graphs on NVIDIA GPUs. The stack adds torch.compile, dynamic quantization to float8 and reduced KV cache precision to boost throughput without sacrificing transcription quality, with WER comparable to Transformer baselines across standard datasets.

★★★★★·HuggingFace Blog

656MAY 1200:00

articleHuggingFace Blog·last yr.

Vision Language Models (Better, faster, stronger)

Vision Language Models are getting smaller while becoming more capable, with new architectures enabling any-to-any inputs/outputs, multimodal retrieval and agents. The post surveys models like Chameleon/Lumina-mGPT, Qwen 2.5 Omni (Thinker-Talker), MiniCPM-o 2.6, Janus-Pro-7B, and Kimi-VL-A3B-Thinking, plus MoE decoders, RAG, safety, and new benchmarks (MMT-Bench, MMMU-Pro).

★★★★★·HuggingFace Blog

657MAY 1100:00

articleHuggingFace Blog·last yr.

LeRobot Community Datasets: The “ImageNet” of Robotics — When and How?

Cet article présente LeRobot Community Datasets comme une tentative de créer l'équivalent ImageNet pour la robotique, en insistant sur la nécessité de données diversifiées et de pratiques de curation robustes. Il identifie les défis actuels (annotations incohérentes, problèmes de correspondance de caractéristiques, épisodes de faible qualité) et propose des orientations pour améliorer la généralisation via des jeux de données plus ouverts et variés.

★★★★★·HuggingFace Blog

658APR 3000:00

articleHuggingFace Blog·last yr.

The 4 Things Qwen-3’s Chat Template Teaches Us

Le Qwen-3 introduit un template de chat plus sophistiqué: la pensée peut être activée ou désactivée via enable_thinking, et la gestion du contexte est dynamique grâce à un rolling checkpoint qui conserve les réflexions pertinentes pendant les appels d’outils. Le texte souligne aussi une meilleure sérialisation des arguments des outils et montre comment ces choix influencent les performances et la lisibilité du flux de conversation par rapport à Qwen-2.5 et QwQ.

★★★★★·HuggingFace Blog

659APR 3000:00

articleHuggingFace Blog·last yr.

How to Build an MCP Server with Gradio

Gradio now exposes Python functions as MCP tools, enabling LLMs to call them via an MCP server. The guide shows a concise 5-line example converting a letter-counting function into a tool, launching the server, and wiring it into MCP clients with a config snippet.

★★★★★·HuggingFace Blog

660APR 2900:00

articleHuggingFace Blog·last yr.

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

AutoRound is Intel's weight-only post-training quantization that uses signed gradient descent to jointly optimize weight rounding and clipping for accurate low-bit quantization (INT2–INT8). It claims up to 2.1x higher relative accuracy at 2-bit, quantizes a 72B model in ~37 minutes on A100, and supports mixed-bit tuning, multiple export formats, and recipes (auto-round-best/light) with small calibration sets.

★★★★★·HuggingFace Blog

Page 33 / 42

← Prev.Next →

20 of 834 shown

Issue 139 · Digest

The weekly digest, every Sunday.

20 articles ranked by an agent. No noise, no ads. One-click unsubscribe.

[top 7 days]B.1

01.
I turned a $80 RK3562 Android tablet into a Debian Linux workstation
Hacker News (100+ pts)
02.
Mullvad exit IPs as a fingerprinting vector
Lobsters
03.
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
HuggingFace Blog
04.
what 262,715 regex questions on stack overflow haven't answered
Lobsters
05.
int a = 5; a = a++ + ++a; a = ? (2011)
Lobsters

Colophon · MakerC.1

Quentin Lecocq · @celdama

Fullstack dev · CRO freelance · Lille, FR

Lantern is a side-project — aggregation, AI scoring, weekly digest. Built with Next.js 16, Drizzle, Neon & Claude. One maintainer.

[X][GitHub][RSS][Site]

ShortcutsC.2

Search⌘ K
Next articleJ
Previous articleK
OpenEnter
FavoriteF