Skip to content

How to Find and Evaluate Uncensored LLM Models on HuggingFace

I needed to find uncensored LLM models on HuggingFace for a local project. The platform hosts over 500,000 models, so finding the right ones requires knowing where to look and how to evaluate what you find.

The Challenge

HuggingFace’s search is powerful but not intuitive for finding uncensored variants. Standard searches for “uncensored” return mixed results - some models are genuinely uncensored, others are poorly documented, and many have significant capability degradation. I needed a systematic approach.

Finding Models: Three Proven Methods

Method 1: Keyword Search on HuggingFace

The fastest way to find uncensored models is using specific keywords in the HuggingFace search bar.

Primary search keywords:

  1. abliterated - Returns 4,967+ models
  2. heretic - Returns 2,164+ models
  3. uncensored - General search, variable quality

I found that “abliterated” and “heretic” yield the most consistent results because these terms have specific technical meanings in the uncensoring community.

Direct URL searches:

HuggingFace search URLs
https://huggingface.co/models?search=abliterated
https://huggingface.co/models?search=heretic
https://huggingface.co/models?search=uncensored

Sort by downloads to find community-validated models:

Sorted search URLs
https://huggingface.co/models?search=abliterated&sort=downloads

Method 2: Programmatic Search with HuggingFace API

For batch discovery, I use the HuggingFace API to search and filter models programmatically.

Search abliterated models with API
from huggingface_hub import HfApi
api = HfApi()
models = list(api.list_models(search="abliterated", limit=100))
# Extract key information
for model in models[:10]:
print(f"{model.id}: {model.downloads} downloads")

This approach lets me filter by downloads, likes, and last modified date to find active, well-maintained models.

Method 3: Curated Collections

Some creators maintain organized collections of uncensored models. The DavidAU Heretic Collection is particularly useful:

Curated collection URL
https://huggingface.co/collections/DavidAU/heretic-abliterated-uncensored-unrestricted-power

These collections typically include:

  • Detailed model cards explaining uncensoring methodology
  • Benchmark comparisons with base models
  • Usage examples and known limitations

Understanding the Two Main Uncensoring Approaches

Before evaluating models, I needed to understand what makes them different.

Abliterated Models

Abliterated models use mechanistic interpretability techniques to remove refusal directions from model weights without retraining.

Technical characteristics:

Abliteration process
- Identifies refusal directions using SVD/PCA analysis
- Projects out refusal vectors from weights
- No retraining required
- Process completes in minutes

Best for: Quick experimentation, limited compute resources, testing before committing to heretic models.

Heretic Models

Heretic models undergo fine-tuning with Bayesian-optimized kernel methods for more comprehensive uncensoring.

Technical characteristics:

Heretic model training
- Fine-tuning with Optuna TPE search
- 7 global parameters optimized
- Parametric kernel optimization
- Permanent weight modifications

Best for: Maximum uncensoring needed, GPU resources available, permanent deployment.

Evaluating Model Quality

Finding models is easy. Evaluating quality requires a multi-step approach.

Step 1: Check Open LLM Leaderboard Scores

The Open LLM Leaderboard provides standardized benchmarks for model comparison. I look for these key metrics:

Key benchmark metrics
MMLU-Pro: Tests knowledge across 57 subjects
IFEval: Measures instruction following ability
MATH: Tests mathematical reasoning
GPQA: Tests fact-based reasoning
MuSR: Tests multi-step reasoning chains

Target scores for usable uncensored models:

Benchmark quality thresholds
MMLU-Pro: 60-75% (good range)
IFEval: 70%+ (critical for practical use)
MATH: 30-50% (capable models)
GPQA: 40-60% (good range)
MuSR: 60%+ (usable models)

Important limitation: Many uncensored models are not submitted to the official leaderboard. When this happens, I check the model card for self-reported benchmarks.

Step 2: Assess Capability Preservation

Uncensoring can damage model capabilities. I compare uncensored model benchmarks to the base model.

Capability degradation thresholds
0-10% drop: Acceptable
10-20% drop: Warning zone
>20% drop: Model likely damaged, reject

I found that well-constructed abliterated models typically show 0-5% degradation, while heretic models can vary more widely depending on the training approach.

Step 3: Check Community Indicators

Download count is a useful quality signal:

Download count interpretation
>10k downloads: Community-validated quality
1k-10k downloads: Worth investigating
<1k downloads: Test thoroughly before use

I also check:

  • Model card quality (detailed methodology, benchmark comparisons)
  • Creator reputation (established creators like DavidAU, huihui-ai)
  • Update frequency and community engagement

Step 4: Test Refusal Rate

The ultimate test is whether the model actually responds to prompts that typically trigger refusals.

Manual testing protocol:

Refusal testing approach
Test prompts that typically trigger refusals
Score: % of prompts receiving actual responses
Target: 90%+ response rate for well-uncensored models

I test 10-20 prompts across different categories to get a realistic refusal rate.

A Practical Workflow

Here’s the workflow I use when finding and evaluating uncensored models.

Phase 1: Discovery (30 minutes)

  1. Search for “abliterated” models on HuggingFace, sort by downloads
  2. Search for “heretic” models, check DavidAU’s collection
  3. Note 5-10 candidate models matching hardware requirements

Phase 2: Evaluation (2-4 hours)

  1. Check leaderboard scores or model card benchmarks
  2. Compare benchmark scores to base model
  3. Assess capability preservation (look for less than 10% degradation)
  4. Review download counts and model card quality

Phase 3: Testing (1-2 hours)

  1. Download top 2-3 candidates
  2. Run refusal tests with 10-20 prompts
  3. Test practical capabilities (reasoning, creativity, knowledge)
  4. Select best match for use case

Common Mistakes to Avoid

Mistake 1: Assuming all “uncensored” models are equal

Models labeled “uncensored” vary wildly in quality. Some have residual refusals; others have severe capability degradation. Always test refusal rates and benchmark preservation.

Mistake 2: Ignoring capability degradation

Aggressive uncensoring can damage reasoning and coherence. I reject models with >20% benchmark degradation from base model.

Mistake 3: Overlooking hardware requirements

Before downloading, I calculate VRAM needs:

VRAM calculation
Model parameters (B) x 2 (FP16) = Minimum VRAM in GB
For 16GB VRAM: target 7B-14B models
For 24GB VRAM: target up to 14B models comfortably
For 48GB VRAM: target up to 30B models

Mistake 4: Trusting single evaluation sources

I combine multiple evaluation methods: benchmarks + refusal testing + community feedback + practical testing. No single source tells the complete story.

Mistake 5: Not reading model cards

Model cards contain critical information: uncensoring methodology, known limitations, special usage instructions. Always read the full model card before downloading.

Starting Points for Different Hardware

For 16GB VRAM:

16GB VRAM recommendations
Abliterated:
- lukey03/Qwen3.5-9B-abliterated
- huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated (quantized)
Heretic:
- DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED
Community recommended:
- Dirty Shirley Writer V1
- fluffy/l3-8b-stheno-v3.2

For 24GB VRAM:

24GB VRAM recommendations
- llmfan46/Qwen3.5-35B-A3B-heretic-v2 (quantized)
- gpt-oss-20b-heretic-ara-v3
- Magnum cydoms 24b i1

Summary

Finding uncensored LLM models on HuggingFace requires specific keywords (“abliterated”, “heretic”), systematic evaluation (benchmarks, refusal rates, capability preservation), and practical testing. I use the three-phase workflow: discovery through keyword search, evaluation through benchmark analysis, and testing through refusal rate assessment.

The best models balance uncensoring effectiveness with capability preservation: >90% response rate on typically-refused prompts while maintaining >90% of base model benchmark performance. Start with well-documented, high-download models from reputable creators, and always test in your specific use case.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments