[⋯]Loading

Built solo in Lille, FR·v0.6

Dev & AI feed

The best of dev and AI, scored every day by an agent. Filtered, summarized, ranked. No color, no noise — just the substance.

Issue: No. 153
Date: JUN 02, 2026
Edition: EN · DAILY
Sources: 14 active
Articles: 29 today

§ Feed·Vol. 02·No. 153

Last ingest·08:00 UTC+0·Next·08:00

Filters

Reference PanelA.1

01. Type— 5

02. Period— 3

03. Source— 7

04. Score— min.

0 active

$⌘K

Articles / day29

7-day avg.18

Mon → Sun

Feed · 879 articles

sort byscore·DESC ↓

661JUL 1000:00

articleHuggingFace Blog·11 mo. ago

Asynchronous Robot Inference: Decoupling Action Prediction and Execution

Asynchronous inference decouples action prediction from execution in robotic policies, reducing runtime lag and enabling replanning with action chunks. The article describes a two-component architecture (PolicyServer and RobotClient) using gRPC to achieve ~2× speedups and continuous operation, and explains why sequential inference falls short.

★★★★★·HuggingFace Blog

662JUL 1000:00

mcpHuggingFace Blog·11 mo. ago

Building the Hugging Face MCP Server

Building the Hugging Face MCP Server enables customized AI Assistants to access the Hub and thousands of apps through a single URL. The article compares MCP transports (Streamable HTTP vs SSE) and outlines three patterns: Direct Response, Request Scoped Streams, and Server Push Streams, with their trade-offs and needed connection management. It also covers making the server dynamic and remotely configurable, plus client-side usage hints (TypeScript/Python).

★★★★★·HuggingFace Blog

663JUL 1000:00

articleHuggingFace Blog·11 mo. ago

ScreenEnv: Deploy your full stack Desktop Agent

ScreenEnv is a Python library that runs isolated Ubuntu desktop environments in Docker to test and deploy GUI agents with full desktop control, including window management and file operations. It supports both the Model Context Protocol (MCP) and a Direct Sandbox API, enabling flexible integration with existing backends or AI systems. The article provides setup examples and a quick path to build a custom Desktop Agent using smolagents.

★★★★★·HuggingFace Blog

664JUL 0900:00

articleHuggingFace Blog·11 mo. ago

Creating custom kernels for the AMD MI300

Hugging Face and AMD optimize custom kernels for MI300X to boost Llama 3.1 405B inference in FP8 on 8 GPUs. They combine fused residual/RMS norm/FP8 conversion, SwiGLU, and a Skinny GEMM kernel, with benchmarks and open-source tooling in hf-rocm-kernels.

★★★★★·HuggingFace Blog

665JUL 0900:00

articleHuggingFace Blog·11 mo. ago

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Reachy Mini is an open-source, programmable robot for human-robot interaction and AI prototyping. It runs Python (and soon JS/Scratch), ships as a kit, and starts at $299, with a wireless compute version at $449. It emphasizes open hardware/software, a growing community, and privacy controls by design.

★★★★★·HuggingFace Blog

666JUL 0900:00

mcpHuggingFace Blog·11 mo. ago

Upskill your LLMs With Gradio MCP Servers

The article introduces the Model Context Protocol (MCP) and explains how MCP servers can extend LLMs with new abilities via an app-store-like ecosystem. It guides readers to find MCP-compatible spaces in Hugging Face Spaces and demonstrates wiring a Flux.1 Kontext[dev] MCP server to an LLM to edit images from text prompts.

★★★★★·HuggingFace Blog

667JUL 0800:00

articleHuggingFace Blog·11 mo. ago

SmolLM3: smol, multilingual, long-context reasoner

SmolLM3 is a competitive open 3B model (11T tokens) with 128k long context via NoPE and YaRN, and multilingual support across six languages. It uses a Llama-based decoder with Grouped Query Attention, intra-document masking, and training-stability tweaks, plus a dual Instruct/Reasoning model.

★★★★★·HuggingFace Blog

668JUL 0800:00

articleHuggingFace Blog·11 mo. ago

Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure

Hugging Face outlines three major alerts that bolster production infrastructure monitoring. The highlighted NAT Gateway throughput alert acts as an early warning for unusual egress traffic and helps with cost optimization, while other alerts cover Kubernetes API request errors, rate limiting, and a bonus alert for new clusters sending zero metrics.

★★★★★·HuggingFace Blog

669JUL 0800:00

articleHuggingFace Blog·11 mo. ago

Efficient MultiModal Data Pipeline

Efficient MultiModal Data Pipeline identifies data pipeline waste in multimodal training, like idle GPUs and excessive padding. It outlines a five-stage approach—from visualizing the dataset to packing batches with a knapsack strategy—to reduce padding and maximize throughput. The post also introduces an iterable dataset and a repository for practical reproduction.

★★★★★·HuggingFace Blog

670JUL 0412:25

articleHuggingFace Blog·11 mo. ago

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

Annonce de la compétition NeurIPS 2025 E2LM visant à capturer les signaux en phase précoce d'entraînement des LLMs dans le domaine scientifique. Les submissions seront évaluées via ScoreSQ, ScoreRC et ScoreCS, avec Score = α1·ScoreSQ + α2·ScoreRC + α3·ScoreCS et des poids α1=0.5, α2=0.1, α3=0.4. L'inscription se fait sur Hugging Face et l'évaluation s'appuie sur lm-evaluation-harness, avec des phases allant de juillet à novembre 2025 et résultats prévus le 04 novembre 2025.

★★★★★·HuggingFace Blog

671JUL 0100:00

articleHuggingFace Blog·11 mo. ago

Training and Finetuning Sparse Embedding Models with Sentence Transformers

Sentence Transformers can finetune sparse encoder/embedding models for retrieval, hybrid search, and reranking. The post outlines training components (model, datasets, losses, trainer, evaluators) and provides practical examples, including using naver/splade-v3 and decoding sparse embeddings for interpretability.

★★★★★·HuggingFace Blog

672JUN 2721:09

articleHuggingFace Blog·11 mo. ago

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

Le NVIDIA Llama Nemotron Nano VL est un modèle Vision-Language 8B optimisé pour l’identification et l’OCR de documents, offrant une haute précision OCRBench v2 et une extraction fiable de texte, tableaux et éléments visuels. Disponible sur Hugging Face, il s’appuie sur Llama-3.1-8B-Instruct et C-RADIOv2-VLM-H et peut être post-entraîné via NVIDIA NeMo; un tutoriel Notebook permet de construire rapidement des solutions d’automatisation pour factures, contrats et documents de santé.

★★★★★·HuggingFace Blog

673JUN 2600:00

articleHuggingFace Blog·11 mo. ago

Gemma 3n fully available in the open-source ecosystem!

Gemma 3n is now available in major open-source libraries (transformers, timm, llama.cpp) and runs on-device with multimodal support (text, image, audio, video). It ships two sizes (5B/8B real; 2B/4B VRAM), a flexible MatFormer architecture, and mix-and-match sub-models. It includes a MobileNet-V5 vision encoder and a USM audio encoder, supports 140 languages, and offers a Hugging Face Space demo.

★★★★★·HuggingFace Blog

674JUN 2300:00

articleHuggingFace Blog·11 mo. ago

Transformers backend integration in SGLang

SGLang now supports Hugging Face transformers as a backend, enabling high-throughput, low-latency inference for any transformers-compatible model. It can auto-fall back to transformers or be forced with impl='transformers', and adds features like RadixAttention for efficiency. The article includes practical code snippets and deployment patterns (offline engine, server, and OpenAI-compatible API).

★★★★★·HuggingFace Blog

675JUN 1900:00

articleHuggingFace Blog·last yr.

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

New post shows efficient fine-tuning of FLUX.1-dev on consumer hardware via QLoRA with the diffusers library, targeting peak VRAM under ~10 GB on a single GPU. It explains loading a quantized 4-bit base model, training FP16/BF16 LoRA adapters, uses an 8-bit AdamW optimizer, and discusses options to load or merge LoRA adapters with results demonstrated on an RTX 4090.

★★★★★·HuggingFace Blog

676JUN 1600:00

articleHuggingFace Blog·last yr.

Groq on Hugging Face Inference Providers

Groq is now a supported Inference Provider on the Hugging Face Hub, enabling serverless inference on model pages and tight integration with Python and JavaScript SDKs. The LPUs promise lower latency and higher throughput for LLMs, with support for models like Meta's Llama 4 and Qwen's QWQ-32B, and two billing modes: direct provider key or routing through HF.

★★★★★·HuggingFace Blog

677JUN 1208:00

articleHuggingFace Blog·last yr.

How Long Prompts Block Other Requests - Optimizing LLM Performance

The article analyzes how long prefill prompts can block the prefill queue in a multi-request setting, and explains that decoding steps are light but must be sequential. It discusses two patterns - chunked prefill and request-parallel prefills - and why long prompts undermine throughput, with implications for vLLM scheduling.

★★★★★·HuggingFace Blog

678JUN 1200:00

articleHuggingFace Blog·last yr.

Learn the Hugging Face Kernel Hub in 5 Minutes

Hugging Face's Kernel Hub lets Python apps load pre-compiled, optimized kernels directly from the Hub, avoiding local builds. It includes a quick code example to fetch a kernel (e.g., activation) and apply it, and discusses integrating kernels into models like RMSNorm and FlashAttention. The article also covers performance benchmarking and real-world use cases.

★★★★★·HuggingFace Blog

679JUN 1200:00

articleHuggingFace Blog·last yr.

Featherless AI on Hugging Face Inference Providers

Featherless AI est désormais supporté comme Inference Provider sur Hugging Face Hub, permettant l'inférence serverless directement sur les pages des modèles et accessible via les SDK JS et Python. Il prend en charge un large éventail de modèles open-source et offre deux modes d’appel (clé personnalisée ou routée par HF) avec une tarification directe sur le compte utilisateur. Des exemples Python et JS montrent comment l'utiliser avec Featherless AI.

★★★★★·HuggingFace Blog

680JUN 1118:27

articleHuggingFace Blog·last yr.

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

Cet article présente GR00T N1.5 et explique comment réaliser un post-entraînement sur le bras LeRobot SO-101. Il propose un tutoriel pas-à-pas couvrant l'installation, la préparation du dataset et le fine-tuning, puis l'évaluation et le déploiement. Des commandes et configurations (modality.json, scripts/gr00t_finetune.py) permettent une adaptation rapide du modèle à votre robot.

★★★★★·HuggingFace Blog

Page 34 / 44

← Prev.Next →

20 of 879 shown

Issue 153 · Digest

The weekly digest, every Sunday.

20 articles ranked by an agent. No noise, no ads. One-click unsubscribe.

Subscribe →

[top 7 days]B.1

01.
thunderbolt-ibverbs: We have InfiniBand at home
Lobsters
02.
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
HuggingFace Blog
03.
Five Years of Trying to Add Recursion to lychee
Lobsters
04.
ELF Linker Improvements in Zig
Lobsters
05.
UTF8 email with DMA: DragonFly Mail Agent
Lobsters

Colophon · MakerC.1

Quentin Lecocq · @celdama

Fullstack dev · CRO freelance · Lille, FR

Lantern is a side-project — aggregation, AI scoring, weekly digest. Built with Next.js 16, Drizzle, Neon & Claude. One maintainer.

[X][GitHub][RSS][Site]

ShortcutsC.2

Search⌘ K
Next articleJ
Previous articleK
OpenEnter
FavoriteF

Dev & AI feed

§Feed · 879 articles

Feed · 879 articles