Feed·Digest·Sources·About·

[⋯]Loading

© 2026 Lantern·Set in Geist Mono·Sources [52]·Methodology·Privacy

Built solo in Lille, FR·v0.6

Dev & AI feed

The best of dev and AI, scored every day by an agent. Filtered, summarized, ranked. No color, no noise — just the substance.

Issue: No. 141
Date: MAY 21, 2026
Edition: EN · DAILY
Sources: 14 active
Articles: 42 today

§ Feed·Vol. 02·No. 141

Last ingest·08:00 UTC+0·Next·08:00

Filters

Reference PanelA.1

01. Type— 5

02. Period— 3

03. Source— 7

04. Score— min.

0 active

$⌘K

Articles / day42

7-day avg.42

Mon → Sun-62%

Feed · 851 articles

sort byscore·DESC ↓

681APR 2218:33

articleHuggingFace Blog·last yr.

Finetuning olmOCR to be a faithful OCR-Engine

Researchers fine-tuned olmOCR-7B-0225-preview to preserve header and footer information, making it a more faithful OCR engine for business documents. They created a dataset of 8,000 documents with Qwen2.5-VL-72B-Instruct, trained with 4 gradient accumulation steps on 8xH100 for 2.5 epochs, and evaluated on header/footer-inclusive data using document anchoring. The result is a practical improvement for invoices and other layout-rich texts.

★★★★★·HuggingFace Blog

682APR 1610:10

articleHuggingFace Blog·last yr.

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Explains the two stages of token generation for LLMs—prefill, where input tokens are processed in parallel to produce the first token, and decode, where subsequent tokens are generated sequentially using a KV cache. It defines latency metrics (time to first token and time per output token) and analyzes how concurrent requests and batching affect throughput on multi-GPU setups. It also hints at batching patterns like prefill-first and chunked prefill to optimize latency.

★★★★★·HuggingFace Blog

683APR 1600:00

articleHuggingFace Blog·last yr.

Cohere on Hugging Face Inference Providers

Cohere devient fournisseur d'inférence sur Hugging Face Hub, permettant l'inférence serverless via Cohere et Cohere Labs sur une gamme de modèles optimisés pour l'entreprise. Les points forts incluent des contextes longs (256k sur le modèle A-03-2025), une prise en charge multilingue (23 langues) et une RAG avec citations, sécurité et outils d'agentivité. L'article décrit l'usage via l'UI, les SDK clients et un notebook Colab pour tester, avec des exemples Python utilisant huggingface_hub.

★★★★★·HuggingFace Blog

684APR 1600:00

articleHuggingFace Blog·last yr.

17 Reasons Why Gradio Isn't Just Another UI Library

Gradio is framed as a full framework for ML apps, not just a UI library, offering features like universal API access, an Interactive API Recorder, SSR for fast apps, automatic queueing and real-time streaming, and enterprise-grade security. The piece lists 17 capabilities that differentiate it for production ML workflows.

★★★★★·HuggingFace Blog

685APR 1600:00

articleHuggingFace Blog·last yr.

Introducing HELMET: Holistically Evaluating Long-context Language Models

HELMET introduces a comprehensive benchmark for evaluating long-context language models, addressing the shortcomings of perplexity and synthetic tasks by emphasizing diversity, controllability, and reliability. The blog reports evaluation across 59 LCLMs, highlights real-world task gaps, and provides a quickstart guide and links to code, data, and the paper for practical replication.

★★★★★·HuggingFace Blog

686APR 1400:00

articleHuggingFace Blog·last yr.

4M Models Scanned: Protect AI + Hugging Face 6 Months In

Protect AI and Hugging Face expanded Guardian's threat detection with four new modules (PAIT-ARV-100, PAIT-JOBLIB-101, PAIT-TF-200, PAIT-LMAFL-300) to cover more formats and obfuscation techniques. The integration emphasizes a zero-trust security stance with inline alerts on Hugging Face and a huntr bug‑bounty program, reporting 4.47M scans and 352k unsafe issues.

★★★★★·HuggingFace Blog

687APR 1400:00

articleHuggingFace Blog·last yr.

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition

Hugging Face acquires Pollen Robotics to push open-source robotics, extending LeRobot with Reachy 2, an open humanoid robot used in labs. Reachy 2 is open-source and VR-compatible, aimed at research and education, and can be ordered for $70,000.

★★★★★·HuggingFace Blog

688APR 1114:21

articleHuggingFace Blog·last yr.

Visual Salamandra: Pushing the Boundaries of Multimodal Understanding

Visual Salamandra extends the Salamandra 7B LLM to images and video via Google's SigLIP encoder and late-fusion, enabling vision-language alignment. It uses a four-phase training pipeline (projector pre-training, high-quality vision pretraining, instruction tuning, full multimodal tuning) with 6.1M instructions, prioritizing multilingual European data.

★★★★★·HuggingFace Blog

689APR 0900:00

articleHuggingFace Blog·last yr.

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

Hugging Face and Cloudflare have teamed up to give FastRTC developers instant access to enterprise-grade WebRTC infrastructure via a Hugging Face token, combining FastRTC's low-code real-time streams with Cloudflare's global TURN network. The integration allows free streaming up to 10 GB per month and provides a ready-made path to building low-latency voice and video apps, with a sample voice chat demo using Llama 4.

★★★★★·HuggingFace Blog

690APR 0800:00

articleHuggingFace Blog·last yr.

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

Arabic-Leaderboards Space unifies Arabic evaluations, housing AraGen-03-25 and Arabic Instruction Following, with plans to add more modalities. The AraGen-03-25 release expands to 340 QA/Reasoning/Orthography pairs and uses blind testing for fair evaluation, plus sharing Claude-3.5-Sonnet results to invite community review.

★★★★★·HuggingFace Blog

691APR 0500:00

articleHuggingFace Blog·last yr.

Welcome Llama 4 Maverick & Scout on Hugging Face

Meta's Llama 4 Maverick (~400B) and Llama 4 Scout (~109B) are Mixture-of-Experts LLMs with 17B active parameters and native multimodality (text + images). They integrate with Hugging Face transformers and TGI, with Scout accessible on a single GPU via 4-/8-bit quantization and Maverick in BF16/FP8; Instruct variants support context lengths up to 1M tokens. Checkpoints are on the Hugging Face Hub under meta-llama, with Xet storage and the Llama 4 Community License.

★★★★★·HuggingFace Blog

692APR 0400:00

articleHuggingFace Blog·last yr.

Journey to 1 Million Gradio Users!

Gradio evolved from a single high-level interface to a modular Blocks API, enabling flexible app composition. Its growth relied on investing in primitives, embedding virality via share links, and focusing on a growing niche. The key lesson: favor low-level components and rapid iteration to scale OSS tooling.

★★★★★·HuggingFace Blog

693APR 0300:00

articleHuggingFace Blog·last yr.

The NLP Course is becoming the LLM Course

Hugging Face is refreshing its NLP course into The LLM Course, adding chapters on fine-tuning, inference, and retrieval while retaining core NLP topics. It emphasizes open-source collaboration, community inputs, and interactive exercises with live sessions when beneficial. Materials will align with transformers, Spaces, and the Hugging Face Hub, inviting contributors.

★★★★★·HuggingFace Blog

694APR 0213:33

articleHuggingFace Blog·last yr.

Efficient Request Queueing – Optimizing LLM Performance

Serving LLMs to many users in parallel is challenging due to GPU contention. The proposed fix is fair scheduling at the LLM-Server with per-user queues and a round-robin scheduler, preserving order before reaching the backend. Extensions include KV-cache-aware routing and back-end priority ideas.

★★★★★·HuggingFace Blog

695MAR 3100:00

articleHuggingFace Blog·last yr.

How Hugging Face Scaled Secrets Management for AI Infrastructure

Hugging Face faced increasing secret sprawl across AWS, Azure, and GCP and replaced Vault with Infisical for centralized, developer-friendly secrets management. They integrated Infisical with Terraform and the Kubernetes Operator to auto-sync updates via an InfisicalSecret CRD, enabling automatic secret reloads in deployments and tighter RBAC with SSO. The migration also restructured projects into distinct infrastructure and application domains to standardize rotation and support IaC practices.

★★★★★·HuggingFace Blog

696MAR 2800:00

articleHuggingFace Blog·last yr.

Accelerating LLM Inference with TGI on Intel Gaudi

Text Generation Inference intègre désormais le support Gaudi d'Intel directement dans TGI, avec une architecture multi-backend et compatibilité Gaudi1/2/3. L'article présente les bénéfices (diversité matérielle, coût, production-ready features) et liste les modèles optimisés (Llama 3.x, Mistral, Mixtral) et donne un démarrage rapide (Docker + exemple curl) pour lancer TGI sur Gaudi. Il mentionne aussi le FP8 via Intel Neural Compressor et invite à tester et contribuer.

★★★★★·HuggingFace Blog

697MAR 2618:47

articleHuggingFace Blog·last yr.

Open R1: Update #4

DeepSeek publie Open R1: Update #4, version améliorée de DeepSeek-V3 avec licence MIT et de meilleures capacités d’instruction et de code. Les benchmarks montrent des gains (MMLU-Pro 81.2, GPQA 68.4, AIME 59.4, LiveCodeBench 49.2) et des progrès en développement web, écriture chinoise et appel de fonctions. Utilisation via Inference Providers (Fireworks, Hyperbolic, Novita) et Text Generation Inference, avec des spéculations sur le finetuning (pré-entraînement et post-formation).

★★★★★·HuggingFace Blog

698MAR 2600:00

articleHuggingFace Blog·last yr.

Training and Finetuning Reranker Models with Sentence Transformers

This blogpost shows how to finetune a reranker (Cross Encoder) with Sentence Transformers for domain-specific retrieval, detailing datasets, loss functions, training arguments, evaluators, and the trainer. It demonstrates that small, domain-tuned models can outperform public rerankers and scale with larger bases, in a two-stage retrieve-and-rerank setup.

★★★★★·HuggingFace Blog

699MAR 2400:00

articleHuggingFace Blog·last yr.

Introducing Gradio's new Dataframe!

Gradio released a major update to gr.Dataframe, adding multi-cell selection, row numbers, pinned columns, and new UI controls like a copy button and fullscreen view. The update also improves accessibility, styling, and supports static columns, search, and filtering for more intuitive data exploration. Gradio notes 70 issues resolved in six weeks and provides upgrade instructions and a quick example to get started.

★★★★★·HuggingFace Blog

700MAR 2100:00

articleHuggingFace Blog·last yr.

The New and Fresh analytics in Inference Endpoints

The article introduces updates to the Inference Endpoints analytics dashboard, highlighting real-time metrics, faster data loading, and customizable time ranges with auto-refresh. It also adds a replica lifecycle view to monitor state transitions across multiple replicas, aiding monitoring and debugging. Feedback is invited as the features evolve.

★★★★★·HuggingFace Blog

Page 35 / 43

← Prev.Next →

20 of 851 shown

Issue 141 · Digest

The weekly digest, every Sunday.

20 articles ranked by an agent. No noise, no ads. One-click unsubscribe.

[top 7 days]B.1

01.
Chasing down why installing the kernel segfaulted
Lobsters
02.
I turned a $80 RK3562 Android tablet into a Debian Linux workstation
Hacker News (100+ pts)
03.
Mullvad exit IPs as a fingerprinting vector
Lobsters
04.
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
HuggingFace Blog
05.
int a = 5; a = a++ + ++a; a = ? (2011)
Lobsters

Colophon · MakerC.1

Quentin Lecocq · @celdama

Fullstack dev · CRO freelance · Lille, FR

Lantern is a side-project — aggregation, AI scoring, weekly digest. Built with Next.js 16, Drizzle, Neon & Claude. One maintainer.

[X][GitHub][RSS][Site]

ShortcutsC.2

Search⌘ K
Next articleJ
Previous articleK
OpenEnter
FavoriteF