Vol. I · About

How Lantern reads the web for you · doc v0.6 · updated June 06, 2026

Edited by Q. Lecocq · Lille, FR

§ MethodologyWhy · how · for whom

About & method.

Lantern is an AI-powered aggregator. It reads RSS, scrapes a handful of sites, scores each article with an LLM, and delivers a digestible selection. No ads, no trackers, no opaque algorithm. This page documents exactly what happens between a source URL and the article that lands in your feed.

§ 01WHY LANTERN

the origin

~2 min

I used to open Hacker News, Lobsters, ten RSS feeds, and three Substacks every morning with my coffee. Sixty tabs. I'd lose an hour a day scrolling, and still miss the good stuff because it drowned in the noise.

Lantern is what I'd rather have read instead. One page, already filtered. An agent that already read the 312 articles of the week for me, dropped the duplicates, scored the rest, and handed me a short selection with honest summaries — no marketing, no rephrased clickbait.

I build it solo, evenings and weekends. It's an author project, not a startup. That means three things:

No ads, no trackers. Lantern doesn't follow you, doesn't sell your data, and loads zero third-party script.
No growth hacking. No push notif, no gamification, no "streak". You read when you want.
Explicit curation. Sources are public, scoring is documented below, and you can always turn off the AI to see the raw feed.

§ 02THE PIPELINE

from a URL to the feed

5 steps · ~9 min cumulative per article

Every article that lands in your feed traversed five steps. Most take a few seconds; LLM scoring is the bottleneck. A full run executes every 15 minutes for fast feeds (HN, Lobsters), every hour for the rest.

+-- 01 -----+    +-- 02 ------+    +-- 03 ------+    +-- 04 -----+    +-- 05 -----+
|  FETCH    | -> |  EXTRACT   | -> |  DEDUPE    | -> |  SCORE    | -> |  PUBLISH  |
+-----------+    +------------+    +------------+    +-----------+    +-----------+
RSS/scrape       readability       vector hash       gpt-5-nano     ranking + feed
   ~2s             ~0.4s             ~0.1s              ~7s            instant

What follows details each step — what it gets, what it produces, what can fail.

01FETCH

Raw content retrieval

A cron job hits the 52 sources via RSS, Atom, GitHub releases, and two homemade scrapers for sites without feeds. Each response is cached 4h to avoid hammering upstream servers.

StackNode + got

Cadence15 min

Timeout10s

Failureretry x3

02EXTRACT

Clean content extraction

The readability algorithm (Mozilla port) splits content from chrome (header, sidebar, related). For RSS feeds that already deliver clean markup, we skip this step. ~3% of extractions fail — the article is kept but flagged as "incomplete extract" in the detail.

Lib@mozilla/readability

Outputtitle, body, lang

Cap8000 chars

03DEDUPE

Semantic deduplication

A simhash is computed on the first 4000 characters. If an article ingested in the last 14 days has a hash within distance < 8, we treat it as a cross-post (same item on HN and Lobsters, for instance) and keep the earliest source — the others are marked "alias".

Methodsimhash 64-bit

Window14d

Thresholddistance < 8

04SCORE

LLM scoring & tagging

GPT-5-nano receives the title, the excerpt (max 2000 chars), and a short system prompt. It returns: a 1-to-5 score on 3 axes (depth, novelty, applicability), a type ([article] / [tool] / [agent] / [mcp]), 2 to 5 tags, and a ~80-word summary. See §03 How I score for the full prompt.

Modelgpt-5-nano

Tokens in/out~700/500

Latency p507.2s

FormatJSON schema

05PUBLISH

Publication & ranking

The article joins the database. Feed ranking combines editorial score × recency × social signal (HN points if available). Articles below score 2 are kept but hidden from the default feed (visible via the "all" filter). Once a week, the digest is recompiled from the last 7 days.

DBNeon · pg

Feed cache60s

Reindexon insert

§ 03HOW I SCORE

the prompt, the grid

prompt v0.7 — stable since Feb 2026

Scoring is the only place where AI decides something on your behalf. To keep it honest, here's the exact system prompt and the grid it applies. If you disagree with a score, you can re-score it yourself on the article page — your personal score takes precedence over the editorial one in your future digests.

The grid

5 ★ — Essential. Either a deep paper, a tool that changes a workflow, or a rare experience report.
4 ★ — Very good. You learn something new and actionable. Most Pick of the Week comes from here.
3 ★ — Solid but familiar. Well written, known topic. Quick hits often come from here.
2 ★ — Marginal. Basic tutorial, news without angle, promotional post. Hidden by default.
1 ★ — Noise. SEO bait, badly sourced repost, hype-piece. Hidden.

The system prompt (excerpt)

# role: editor
You are an editor for a tech newsletter aimed at experienced devs
and AI engineers. Score the following article on a 1-5 scale based
on technical depth, novelty, and applicability.

# anti-patterns (auto -1)
- listicle without substance ("10 X you must know")
- pure news without analysis
- vendor-pushed content without independent angle
- AI-generated prose (detect via burstiness + perplexity)

# bonus (auto +1, max once)
- post-mortem with concrete numbers
- benchmark against alternatives
- author has hands-on production experience

Return strict JSON: { depth, novelty, applicability, type, tags, summary }

The full prompt is versioned and frozen for at least 3 months, so scores stay comparable between weeks. When we bump (rare), it's noted in the changelog.

What I don't do

No per-user personalization. Everyone sees the same editorial score. Lantern doesn't learn from your clicks.
No A/B testing on titles. The displayed title is the source's, period.
No commercial boost. No source pays to appear. No source is excluded for criticizing us.

§ 04FAQ

the usual questions

7 entries

Q.01Lantern is free. What's the catch?

No catch. It's a personal project, infra cost (~$25/mo) is manageable solo. If the audience grows, I'll probably open an optional "supporter" tier to help cover LLM compute, but the core will stay free. No disguised freemium, no paywall appearing one day.

Q.02Why no mobile app?

The site is responsive and reads fine on mobile. A native app would require a store, reviews, a release cycle — too much for a solo project. If you want something on your home screen, add the site as a PWA from Safari/Chrome.

Q.03How do I suggest a source?

From the Sources page, at the bottom, "Add a source". You paste the URL, I check manually (quality, frequency, fit), and add if it matches the feed. You can also ping me on X or by email.

Q.04Is my data shared?

No. No third-party tracker, no pixel, zero third-party script. Full details — what's collected, subprocessors (Vercel, Neon EU, Resend, OpenAI, GitHub, Google), retention and your GDPR rights — are on the Privacy Policy page.

Q.05Why a single model instead of switching?

Because the score must stay comparable from week to week. Switching models subtly changes the grid, and pollutes historical comparisons. I benchmark the model against competitors every 3 months; I only migrate if the quality gap is clear, and I note the migration in the changelog.

Q.06Is the digest AI-generated?

The selection is editorial (I open the 70 articles at 4★+ from the week and pick 7 by hand). The short summary on the Pick of the Week is generated by LLM — flagged [SYNTH] on the page. Everything else is written or compiled manually.

Q.07Can Lantern miss an important article?

Yes. Three cases: (1) the source isn't covered — if the author posts on a personal blog not indexed by Lantern, I won't see it (suggest a source via the Sources page is the best fix); (2) the LLM under-rates — a short but dense post, or a dry academic paper, can land at 3★ when it deserved 4 (see §03 on the median 0.4★ gap); (3) a cross-post is filed as alias — if Lantern sees the same article on HN and Lobsters, it keeps the first source seen and marks the other as alias. For all three, the "show all" filter in the feed surfaces hidden and aliased articles. Lantern isn't exhaustive — it's filtered. If you want everything, keep Feedly on the side.

§ 05CHANGELOG

what moved

8 recent entries

v0.6May 02FEATUREAbout page + 2-col editorial sign-in — UI refresh, shared design tokens.

v0.5May 01REFACTORModular pipeline — extract/dedupe/score/publish split, schemas split (articles/scores/sources).

v0.4Apr 30SOURCEBroader dev front+back sources.

v0.3.1Apr 29FEATUREWeekly digest shipped via gpt-5 + admin dashboard /admin (7 sections).

v0.3Apr 29MIGRATIONIngestion switch: opencode zen → OpenAI gpt-5-nano (quality + cost).

v0.2Apr 27FEATUREWeb content fetcher (Readability) + GitHub Trending RSS sources.

v0.1Apr 26RELEASELantern bootstrap — Next.js 16, Drizzle/Neon, Auth.js magic link, LLM scoring, Vercel cron.

v0.0Apr 26INITInitial commit — Next.js scaffold + feature-based structure.

— The author

Built solo by Quentin Lecocq.

Fullstack dev by training, designer by accident, based in Lille. Lantern is my side-project in progress since November 2025. If you want to discuss a source to add, report a bug, or just chat tech watch — I'm reachable everywhere.

Every reply is read. Promise.

[X][GitHub][LinkedIn][Mail]

+- colophon ---------+
| stack    : Next 16      |
| db       : Neon · pg    |
| host     : Vercel       |
| llm      : gpt-5-nano   |
| font     : Geist Mono   |
| since    : Apr 26 2026  |
+--------------------+