toolGitHub Trending — Python
p-e-w/heretic
Heretic is a fully automatic censorship removal tool for transformer LMs that uses directional ablation with a TPE-based optimizer to decensor models without post-training. It achieves similar refusal suppression as manually ablated models but with substantially lower KL divergence, supports a wide range of dense and multimodal models, and provides research-oriented features like residual plots and residual-geometry analysis.
publié 30 AVR. 2026★★★★★
Lire la sourcegithub.com/p-e-w/heretic
[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern
- Source
- GitHub Trending — Python
- Ingéré
- 30 AVR. 2026 · 04:08
- Score édito
- 5.0 / 5