FeedThis weekArticle
toolGitHub Trending — Python

p-e-w/heretic

Heretic is a fully automatic censorship removal tool for transformer LMs that uses directional ablation with a TPE-based optimizer to decensor models without post-training. It achieves similar refusal suppression as manually ablated models but with substantially lower KL divergence, supports a wide range of dense and multimodal models, and provides research-oriented features like residual plots and residual-geometry analysis.

published APR 30, 2026★★★★★
Read the sourcegithub.com/p-e-w/heretic
[*] Opens in a new tab · no tracking on Lantern's side
Source
GitHub Trending — Python
Ingested
APR 30, 2026 · 04:08
Editorial score
5.0 / 5