Feed·Digest·Sources·About·

[⋯]Loading

© 2026 Lantern·Set in Geist Mono·Sources [52]·Methodology·Privacy

Built solo in Lille, FR·v0.6

Dev & AI feed

The best of dev and AI, scored every day by an agent. Filtered, summarized, ranked. No color, no noise — just the substance.

Issue: No. 156
Date: JUN 05, 2026
Edition: EN · DAILY
Sources: 14 active
Articles: 0 today

§ Feed·Vol. 02·No. 156

Last ingest·08:00 UTC+0·Next·08:00

Filters

Reference PanelA.1

01. Type— 5

02. Period— 3

03. Source— 7

04. Score— min.

0 active

$⌘K

Articles / day0

7-day avg.24

Mon → Sun

Feed · 903 articles

sort byscore·DESC ↓

701JUN 1208:00

articleHuggingFace Blog·last yr.

How Long Prompts Block Other Requests - Optimizing LLM Performance

The article analyzes how long prefill prompts can block the prefill queue in a multi-request setting, and explains that decoding steps are light but must be sequential. It discusses two patterns - chunked prefill and request-parallel prefills - and why long prompts undermine throughput, with implications for vLLM scheduling.

★★★★★·HuggingFace Blog

702JUN 1200:00

articleHuggingFace Blog·last yr.

Featherless AI on Hugging Face Inference Providers

Featherless AI est désormais supporté comme Inference Provider sur Hugging Face Hub, permettant l'inférence serverless directement sur les pages des modèles et accessible via les SDK JS et Python. Il prend en charge un large éventail de modèles open-source et offre deux modes d’appel (clé personnalisée ou routée par HF) avec une tarification directe sur le compte utilisateur. Des exemples Python et JS montrent comment l'utiliser avec Featherless AI.

★★★★★·HuggingFace Blog

703JUN 1200:00

articleHuggingFace Blog·last yr.

Learn the Hugging Face Kernel Hub in 5 Minutes

Hugging Face's Kernel Hub lets Python apps load pre-compiled, optimized kernels directly from the Hub, avoiding local builds. It includes a quick code example to fetch a kernel (e.g., activation) and apply it, and discusses integrating kernels into models like RMSNorm and FlashAttention. The article also covers performance benchmarking and real-world use cases.

★★★★★·HuggingFace Blog

704JUN 1118:27

articleHuggingFace Blog·last yr.

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

Cet article présente GR00T N1.5 et explique comment réaliser un post-entraînement sur le bras LeRobot SO-101. Il propose un tutoriel pas-à-pas couvrant l'installation, la préparation du dataset et le fine-tuning, puis l'évaluation et le déploiement. Des commandes et configurations (modality.json, scripts/gr00t_finetune.py) permettent une adaptation rapide du modèle à votre robot.

★★★★★·HuggingFace Blog

705JUN 1100:00

articleHuggingFace Blog·last yr.

Introducing Training Cluster as a Service - a new collaboration with NVIDIA

Hugging Face and NVIDIA announce Training Cluster as a Service to give researchers easy access to large GPU clusters for training foundational models, paying only for training durations. The solution combines NVIDIA DGX Cloud Lepton, regional capacity, and Hugging Face tooling to provision, price, and manage clusters, with user requests via hf.co/training-cluster.

★★★★★·HuggingFace Blog

706JUN 0600:00

articleHuggingFace Blog·last yr.

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

ScreenSuite est une suite de benchmarking complète pour évaluer les GUI Agents, couvrant perception, grounding, actions simples et multi‑étapes à travers 13 benchmarks. Elle propose des environnements Docker (Ubuntu/Android) et des sandboxes distants pour tester des agents GUI en conditions reproductibles, en restant vision‑only.

★★★★★·HuggingFace Blog

707JUN 0400:00

articleHuggingFace Blog·last yr.

KV Cache from scratch in nanoVLM

L’article décrit la mise en œuvre du KV Caching dans nanoVLM (PyTorch), qui réduit la redondance de calcul lors de la génération autoregressive en réutilisant les K et V précédents, aboutissant à ~38% de gain en vitesse. Il clarifie où la redondance apparaît dans l’attention et propose un exemple PyTorch minimal pour illustrer l’approche.

★★★★★·HuggingFace Blog

708JUN 0315:04

articleHuggingFace Blog·last yr.

Real-Time AI Sound Generation on Arm: A Personal Tool for Creative Freedom

An on-device, Arm-powered sound generator lets you describe a sound and obtain a .wav within seconds, without cloud or GPU. Built around Stability AI’s Stable Audio Open model, it runs via PyTorch and TorchAudio with optimized multithreading, and integrates seamlessly with Ableton Live.

★★★★★·HuggingFace Blog

709JUN 0313:27

articleHuggingFace Blog·last yr.

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Surfer-H is a web-native agent powered by the Holo1 family of Action Vision Language Models (VLMs), enabling GUI automation and precise UI localization. Holo1-3B and Holo1-7B achieve up to 76.2% UI localization accuracy on benchmarks, with open-source releases on Hugging Face and a WebClick benchmark of 1,639 tasks. Surfer-H operates entirely in-browser with a three-component architecture (Policy, Localizer, Validator) and claims cost-efficient performance.

★★★★★·HuggingFace Blog

710JUN 0300:00

articleHuggingFace Blog·last yr.

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

TRL now runs vLLM co-located with training, sharing GPUs to eliminate idle time and HTTP overhead. By embedding vLLM in the same process group, training and inference take turns on the same devices, with torchrun compatibility, TP/DP support, and GRPO-enabled workflows. The article covers design, implementation notes, and benchmarks (1.5B, 7B, 72B) plus a train_grpo_colocate.py script to try.

★★★★★·HuggingFace Blog

711JUN 0300:00

articleHuggingFace Blog·last yr.

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

SmolVLA-450M is an open-source, compact Vision-Language-Action model for robotics that runs on consumer hardware. It is pretrained on publicly licensed community data and trainable on a single consumer GPU, with asynchronous inference boosting throughput and strong performance on simulation and real-world tasks.

★★★★★·HuggingFace Blog

712MAY 2800:00

articleHuggingFace Blog·last yr.

CodeAgents + Structure: A Better Way to Execute Actions

Cet article démontre que forcer CodeAgents à générer pensées et code dans un JSON structuré peut surpasser les approches traditionnelles sur des benchmarks comme SmolBench. Il explique pourquoi la structure améliore l’exécution des actions et propose des conseils d’implémentation et des cas d’usage pratiques pour les développeurs IA.

★★★★★·HuggingFace Blog

713MAY 2500:00

articleHuggingFace Blog·last yr.

Liger GRPO meets TRL

Report on debugging Liger GRPO loss with DeepSpeed ZeRO-3 using Qwen/Qwen2.5-0.5B-Instruct in bf16, highlighting a shape mismatch during training. The traceback traces through grpo_loss and fused_linear_ppo, pointing to a forward pass issue in the Liger kernel. No fix is shown, but it identifies the code paths to inspect (grpo_loss, fused_linear_ppo).

★★★★★·HuggingFace Blog

714MAY 2300:00

articleHuggingFace Blog·last yr.

Dell Enterprise Hub is all you need to build AI on premises

Dell Enterprise Hub now ships a complete suite of on‑prem AI models and ready‑to‑deploy applications (e.g., Llama 4 Maverick, OpenWebUI), optimized for Dell AI Server platforms via Docker/Kubernetes. It adds an Application Catalog and a dell-ai CLI/Python SDK to run everything from development to on‑prem deployment, plus on‑device models for Dell AI PCs powered by Intel/Qualcomm NPUs.

★★★★★·HuggingFace Blog

715MAY 2300:00

articleHuggingFace Blog·last yr.

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

This post shows how to build a tiny Python agent powered by MCP. It explains a simple loop that pulls tools from MCP servers using an extended huggingface_hub client and runs via a CLI demo.

★★★★★·HuggingFace Blog

716MAY 2106:52

articleHuggingFace Blog·last yr.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Falcon-H1 is a family of six open-source LLMs (0.5B–34B) using a hybrid attention + SSM design, delivering faster inference and lower memory with strong cross-task performance. Available as base and instruction-tuned variants, they support 256K context, 18 native languages (with a tokenizer scalable to 100+), and are optimized for edge-to-large deployments under Apache 2.0.

★★★★★·HuggingFace Blog

717MAY 2106:35

articleHuggingFace Blog·last yr.

Falcon-Arabic: A Breakthrough in Arabic Language Models

Falcon-Arabic is a 7B multilingual LLM from Falcon 3, optimized for Arabic with MSA and dialects. It supports a 32k token context, enabling long-context tasks like RAG and in-depth content creation, and reportedly outperforms larger Arabic LLMs of similar size. Built by TII, it adapts an existing multilingual foundation rather than training from scratch, delivering an efficient, open-source option for Arabic AI.

★★★★★·HuggingFace Blog

718MAY 2100:00

articleHuggingFace Blog·last yr.

Exploring Quantization Backends in Diffusers

Cet article explore les backends de quantification dans Diffusers pour des modèles de diffusion lourds comme Flux, en comparant BF16 et des quantisations (4-bit/8-bit). Il détaille les backends (bitsandbytes, GGUF, torchao, Quanto, FP8) et les composants clés (Text Encoders et Transformer), avec des chiffres mémoire et temps d'inférence qui guident le choix pratique.

★★★★★·HuggingFace Blog

719MAY 2100:00

articleHuggingFace Blog·last yr.

nanoVLM: The simplest repository to train your VLM in pure PyTorch

nanoVLM provides a minimal PyTorch toolkit to train a Vision Language Model on a free Colab tier. It fuses a SigLIP-based vision transformer with a Llama 3 language backbone, using a Modality Projection (pixel shuffle + linear) to align image and text embeddings for decoding. The post offers quickstart steps: clone the repo and run train.py, or use the Colab notebook to begin training without local setup.

★★★★★·HuggingFace Blog

720MAY 1900:00

articleHuggingFace Blog·last yr.

Microsoft and Hugging Face expand collaboration

Microsoft et Hugging Face étendent leur collaboration pour déployer facilement des modèles open sur Azure via AI Foundry, avec plus de 10 000 modèles disponibles. Les modèles sont vérifiés pour la sécurité (safetensors, ProtectAI Guardian et JFrog) et déployables en quelques clics via le bouton Deploy, en choisissant VM et paramètres.

★★★★★·HuggingFace Blog

Page 36 / 46

← Prev.Next →

20 of 903 shown

Issue 156 · Digest

The weekly digest, every Sunday.

20 articles ranked by an agent. No noise, no ads. One-click unsubscribe.

[top 7 days]B.1

01.
thunderbolt-ibverbs: We have InfiniBand at home
Lobsters
02.
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
HuggingFace Blog
03.
Five Years of Trying to Add Recursion to lychee
Lobsters
04.
Use your Nvidia GPU's VRAM as swap space on Linux
Hacker News (100+ pts)
05.
PaceVer (an alternative to SemVer, for mobile apps)
Lobsters

Colophon · MakerC.1

Quentin Lecocq · @celdama

Fullstack dev · CRO freelance · Lille, FR

Lantern is a side-project — aggregation, AI scoring, weekly digest. Built with Next.js 16, Drizzle, Neon & Claude. One maintainer.

[X][GitHub][RSS][Site]

ShortcutsC.2

Search⌘ K
Next articleJ
Previous articleK
OpenEnter
FavoriteF