About & method.
Lantern is an AI-powered aggregator. It reads RSS, scrapes a handful of sites, scores each article with an LLM, and delivers a digestible selection. No ads, no trackers, no opaque algorithm. This page documents exactly what happens between a source URL and the article that lands in your feed.
the origin
I used to open Hacker News, Lobsters, ten RSS feeds, and three Substacks every morning with my coffee. Sixty tabs. I'd lose an hour a day scrolling, and still miss the good stuff because it drowned in the noise.
Lantern is what I'd rather have read instead. One page, already filtered. An agent that already read the 312 articles of the week for me, dropped the duplicates, scored the rest, and handed me a short selection with honest summaries — no marketing, no rephrased clickbait.
I build it solo, evenings and weekends. It's an author project, not a startup. That means three things:
- No ads, no trackers. Lantern doesn't follow you, doesn't sell your data, and loads zero third-party script.
- No growth hacking. No push notif, no gamification, no "streak". You read when you want.
- Explicit curation. Sources are public, scoring is documented below, and you can always turn off the AI to see the raw feed.
from a URL to the feed
Every article that lands in your feed traversed five steps. Most take a few seconds; LLM scoring is the bottleneck. A full run executes every 15 minutes for fast feeds (HN, Lobsters), every hour for the rest.
+-- 01 -----+ +-- 02 ------+ +-- 03 ------+ +-- 04 -----+ +-- 05 -----+ | FETCH | -> | EXTRACT | -> | DEDUPE | -> | SCORE | -> | PUBLISH | +-----------+ +------------+ +------------+ +-----------+ +-----------+ RSS/scrape readability vector hash gpt-5-nano ranking + feed ~2s ~0.4s ~0.1s ~7s instant
What follows details each step — what it gets, what it produces, what can fail.
Raw content retrieval
A cron job hits the 52 sources via RSS, Atom, GitHub releases, and two homemade scrapers for sites without feeds. Each response is cached 4h to avoid hammering upstream servers.
Clean content extraction
The readability algorithm (Mozilla port) splits content from chrome (header, sidebar, related). For RSS feeds that already deliver clean markup, we skip this step. ~3% of extractions fail — the article is kept but flagged as "incomplete extract" in the detail.
Semantic deduplication
A simhash is computed on the first 4000 characters. If an article ingested in the last 14 days has a hash within distance < 8, we treat it as a cross-post (same item on HN and Lobsters, for instance) and keep the earliest source — the others are marked "alias".
LLM scoring & tagging
GPT-5-nano receives the title, the excerpt (max 2000 chars), and a short system prompt. It returns: a 1-to-5 score on 3 axes (depth, novelty, applicability), a type ([article] / [tool] / [agent] / [mcp]), 2 to 5 tags, and a ~80-word summary. See §03 How I score for the full prompt.
Publication & ranking
The article joins the database. Feed ranking combines editorial score × recency × social signal (HN points if available). Articles below score 2 are kept but hidden from the default feed (visible via the "all" filter). Once a week, the digest is recompiled from the last 7 days.
the prompt, the grid
Scoring is the only place where AI decides something on your behalf. To keep it honest, here's the exact system prompt and the grid it applies. If you disagree with a score, you can re-score it yourself on the article page — your personal score takes precedence over the editorial one in your future digests.
The grid
- 5 ★ — Essential. Either a deep paper, a tool that changes a workflow, or a rare experience report.
- 4 ★ — Very good. You learn something new and actionable. Most Pick of the Week comes from here.
- 3 ★ — Solid but familiar. Well written, known topic. Quick hits often come from here.
- 2 ★ — Marginal. Basic tutorial, news without angle, promotional post. Hidden by default.
- 1 ★ — Noise. SEO bait, badly sourced repost, hype-piece. Hidden.
The system prompt (excerpt)
# role: editor
You are an editor for a tech newsletter aimed at experienced devs
and AI engineers. Score the following article on a 1-5 scale based
on technical depth, novelty, and applicability.
# anti-patterns (auto -1)
- listicle without substance ("10 X you must know")
- pure news without analysis
- vendor-pushed content without independent angle
- AI-generated prose (detect via burstiness + perplexity)
# bonus (auto +1, max once)
- post-mortem with concrete numbers
- benchmark against alternatives
- author has hands-on production experience
Return strict JSON: { depth, novelty, applicability, type, tags, summary }The full prompt is versioned and frozen for at least 3 months, so scores stay comparable between weeks. When we bump (rare), it's noted in the changelog.
What I don't do
- No per-user personalization. Everyone sees the same editorial score. Lantern doesn't learn from your clicks.
- No A/B testing on titles. The displayed title is the source's, period.
- No commercial boost. No source pays to appear. No source is excluded for criticizing us.
the usual questions
Q.01Lantern is free. What's the catch?
No catch. It's a personal project, infra cost (~$25/mo) is manageable solo. If the audience grows, I'll probably open an optional "supporter" tier to help cover LLM compute, but the core will stay free. No disguised freemium, no paywall appearing one day.
Q.02Why no mobile app?
The site is responsive and reads fine on mobile. A native app would require a store, reviews, a release cycle — too much for a solo project. If you want something on your home screen, add the site as a PWA from Safari/Chrome.
Q.03How do I suggest a source?
From the Sources page, at the bottom, "Add a source". You paste the URL, I check manually (quality, frequency, fit), and add if it matches the feed. You can also ping me on X or by email.
Q.04Is my data shared?
No. No third-party tracker, no pixel, zero third-party script loaded on the site. No analytics in place for now — I'll probably add one (Plausible or self-hosted) the day it's useful, and it'll be anonymous and documented here. Storage-wise, your favorites and notes live in a Postgres database (Neon, hosted in Europe), accessible only by you via your magic link email. No third party has access. You can export to JSON from /favorites and delete everything at any time.
Q.05Why a single model instead of switching?
Because the score must stay comparable from week to week. Switching models subtly changes the grid, and pollutes historical comparisons. I benchmark the model against competitors every 3 months; I only migrate if the quality gap is clear, and I note the migration in the changelog.
Q.06Is the digest AI-generated?
The selection is editorial (I open the 70 articles at 4★+ from the week and pick 7 by
hand). The short summary on the Pick of the Week is generated by LLM — flagged [SYNTH]
on the page. Everything else is written or compiled manually.
Q.07Can Lantern miss an important article?
Yes. Three cases: (1) the source isn't covered — if the author posts on a personal blog not indexed by Lantern, I won't see it (suggest a source via the Sources page is the best fix); (2) the LLM under-rates — a short but dense post, or a dry academic paper, can land at 3★ when it deserved 4 (see §03 on the median 0.4★ gap); (3) a cross-post is filed as alias — if Lantern sees the same article on HN and Lobsters, it keeps the first source seen and marks the other as alias. For all three, the "show all" filter in the feed surfaces hidden and aliased articles. Lantern isn't exhaustive — it's filtered. If you want everything, keep Feedly on the side.