BitNet Supported Models: Complete Guide to 1-bit LLMs You Can Run

Mar 19, 2026

Problem

I downloaded BitNet and tried to run a model I found on Hugging Face. It didn’t work:

ValueError: Unsupported model architecture
KeyError: 'model_name' not found in SUPPORTED_HF_MODELS

Turns out, BitNet only supports specific models, and the model I picked wasn’t on the list. This guide covers all the models you can actually use with BitNet and which kernel works best for your CPU.

The Short Answer

BitNet currently supports 9 models from Hugging Face:

Official Microsoft:
  - microsoft/BitNet-b1.58-2B-4T (2.4B params) - Recommended

Community Models:
  - 1bitLLM/bitnet_b1_58-large (0.7B params)
  - 1bitLLM/bitnet_b1_58-3B (3.3B params)
  - HF1BitLLM/Llama3-8B-1.58-100B-tokens (8.0B params)
  - tiiuae/Falcon3-1B-Instruct-1.58bit
  - tiiuae/Falcon3-7B-Instruct-1.58bit
  - tiiuae/Falcon3-10B-Instruct-1.58bit
  - tiiuae/Falcon-E-1B-Instruct
  - tiiuae/Falcon-E-3B-Instruct

But there’s a catch: kernel support varies by CPU architecture. Let me explain.

Understanding Kernels

BitNet uses specialized kernels for inference, and not all kernels work on all CPUs:

i2_s  - Integer 2-bit symmetric quantization
        Good balance of speed and quality
        Works on both x86 and ARM

tl1   - Ternary Lookup Table 1
        Optimized for ARM processors
        Uses lookup tables for faster computation

tl2   - Ternary Lookup Table 2
        Optimized for x86 processors
        Better performance on Intel/AMD CPUs

The key insight: ARM CPUs use TL1, x86 CPUs use TL2. Both support I2_S.

Official Microsoft Model

BitNet-b1.58-2B-4T

This is the model I recommend for beginners. Microsoft trained it on 4 trillion tokens, so it’s reasonably capable:

Parameters:        2.4 billion
Training Tokens:   4 trillion
License:           MIT
Context Length:    4096
Memory (i2_s):     ~400 MB

Kernel Compatibility:

┌─────────────────────────────────────────────────────────┐
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    OK    │   NO    │   OK    │  TL2      │
│  ARM        │    OK    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

I tested both kernels on my Mac (ARM) and found TL1 about 15% faster than I2_S for this model.

When to Choose the Official Model

Pick this model if you:

Are new to BitNet and want the “reference” experience
Have limited RAM (< 2GB available)
Need a model with MIT license (commercial use allowed)
Want the most community support and documentation

Community Models by Size

Small Models (0.7B - 1B Parameters)

These models run fast even on older hardware:

┌─────────────────────────────────────────────────────────────┐
│  Model                    │  Params  │  Memory  │  License  │
├─────────────────────────────────────────────────────────────┤
│  bitnet_b1_58-large       │   0.7B   │  ~150MB  │  MIT      │
│  Falcon-E-1B-Instruct     │   1.0B   │  ~200MB  │  Apache   │
└─────────────────────────────────────────────────────────────┘

I tested bitnet_b1_58-large on an old Raspberry Pi 4 (4GB RAM) - it worked, though slowly. Great for edge devices.

Kernel Compatibility for Small Models:

┌─────────────────────────────────────────────────────────┐
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    OK    │   NO    │   OK    │  TL2      │
│  ARM        │    OK    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  Falcon-E-1B Kernel Support                             │
├─────────────────────────────────────────────────────────┤
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    OK    │   NO    │   OK    │  TL2      │
│  ARM        │    OK    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

Medium Models (2B - 3B Parameters)

The sweet spot for most users:

┌─────────────────────────────────────────────────────────────┐
│  Model                    │  Params  │  Memory  │  License  │
├─────────────────────────────────────────────────────────────┤
│  BitNet-b1.58-2B-4T       │   2.4B   │  ~400MB  │  MIT      │
│  bitnet_b1_58-3B          │   3.3B   │  ~550MB  │  MIT      │
│  Falcon-E-3B-Instruct     │   3.0B   │  ~500MB  │  Apache   │
└─────────────────────────────────────────────────────────────┘

Important: The 3B model has different kernel support:

┌─────────────────────────────────────────────────────────┐
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    NO    │   NO    │   OK    │  TL2      │
│  ARM        │    NO    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

Notice that I2_S is NOT supported for the 3B model. You must use TL1 on ARM or TL2 on x86. I learned this the hard way when I tried to quantize with I2_S:

RuntimeError: I2_S kernel not supported for this model architecture

Large Models (7B - 10B Parameters)

For when you need better quality and have more RAM:

┌─────────────────────────────────────────────────────────────┐
│  Model                    │  Params  │  Memory  │  License  │
├─────────────────────────────────────────────────────────────┤
│  Llama3-8B-1.58-100B      │   8.0B   │  ~1.3GB  │  Llama    │
│  Falcon3-7B-Instruct-1.58 │   7.0B   │  ~1.2GB  │  Apache   │
│  Falcon3-10B-Instruct-1.58│  10.0B   │  ~1.7GB  │  Apache   │
└─────────────────────────────────────────────────────────────┘

The Llama3-based model uses 100B training tokens - less than the official 4T, but the base Llama3 architecture is strong.

Kernel Compatibility for Large Models:

┌─────────────────────────────────────────────────────────┐
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    OK    │   NO    │   OK    │  TL2      │
│  ARM        │    OK    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  Falcon3 Family Kernel Support                          │
├─────────────────────────────────────────────────────────┤
│  CPU Type   │   I2_S   │   TL1   │   TL2   │  Best     │
├─────────────────────────────────────────────────────────┤
│  x86        │    OK    │   NO    │   OK    │  TL2      │
│  ARM        │    OK    │   OK    │   NO    │  TL1      │
└─────────────────────────────────────────────────────────┘

Complete Compatibility Matrix

Here’s the full picture at a glance:

                        I2_S        TL1        TL2
Model                  x86  ARM   x86  ARM  x86  ARM
─────────────────────────────────────────────────────
BitNet-b1.58-2B-4T     OK   OK    NO   OK   OK   NO
bitnet_b1_58-large     OK   OK    NO   OK   OK   NO
bitnet_b1_58-3B        NO   NO    NO   OK   OK   NO
Llama3-8B-1.58         OK   OK    NO   OK   OK   NO
Falcon3 family         OK   OK    NO   OK   OK   NO
Falcon-E family        OK   OK    NO   OK   OK   NO
─────────────────────────────────────────────────────

Choosing the Right Model

I use this decision flow:

                    ┌─────────────────┐
                    │ Your RAM limit? │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
           < 1GB         1-2 GB          > 2GB
              │              │              │
              ▼              ▼              ▼
     ┌────────────┐  ┌────────────┐  ┌────────────┐
     │ 0.7B-1B    │  │ 2B-3B      │  │ 7B-10B     │
     │ models     │  │ models     │  │ models     │
     └────────────┘  └────────────┘  └────────────┘
           │              │              │
           ▼              ▼              ▼
     bitnet_large   BitNet-2B      Llama3-8B
     Falcon-E-1B    bitnet-3B      Falcon3-10B

By Use Case

Use Case	Recommended Model	Why
Quick testing	bitnet_b1_58-large	Fastest download and inference
General purpose	BitNet-b1.58-2B-4T	Best quality/size ratio
Better quality	Falcon3-7B-Instruct-1.58	Larger context, more capable
Edge devices	Falcon-E-1B-Instruct	Optimized for low resource
Maximum quality	Falcon3-10B-Instruct-1.58	Best results, needs 2GB+ RAM

Downloading Models

Use huggingface-cli to download models:

# Install huggingface-hub first
pip install huggingface-hub

# Official model (recommended)
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
    --local-dir models/BitNet-b1.58-2B-4T

# Large model
huggingface-cli download HF1BitLLM/Llama3-8B-1.58-100B-tokens \
    --local-dir models/Llama3-8B-1.58

# Falcon3
huggingface-cli download tiiuae/Falcon3-7B-Instruct-1.58bit \
    --local-dir models/Falcon3-7B

Resume Interrupted Downloads

If a download fails (common with large models):

huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
    --local-dir models/BitNet-b1.58-2B-4T \
    --resume-download

Running Different Models

After downloading, use setup_env.py with the correct quantization:

# Official 2B model with I2_S (works on all CPUs)
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

# 3B model - MUST use TL1 (ARM) or TL2 (x86)
python setup_env.py -md models/bitnet_b1_58-3B -q tl1  # ARM
python setup_env.py -md models/bitnet_b1_58-3B -q tl2  # x86

# Llama3-8B with I2_S
python setup_env.py -md models/Llama3-8B-1.58 -q i2_s

Then run inference:

python run_inference.py \
    -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
    -p "You are a helpful assistant" \
    -cnv

Model Quality Comparison

I ran a simple test: asking each model to explain quantum computing in one paragraph. Here’s my subjective ranking:

Model                    Quality  Speed    Overall
───────────────────────────────────────────────────
BitNet-b1.58-2B-4T         4/5     5/5      4.5/5
bitnet_b1_58-large         2/5     5/5      3.5/5
bitnet_b1_58-3B            4/5     4/5      4.0/5
Llama3-8B-1.58             4/5     3/5      3.5/5
Falcon3-7B-Instruct-1.58   4/5     3/5      3.5/5
Falcon3-10B-Instruct-1.58  5/5     2/5      3.5/5

The 2B model hits the sweet spot for most tasks. The 10B model has the best quality but runs noticeably slower on CPU.

Summary

When choosing a BitNet model:

Check your CPU architecture - ARM uses TL1, x86 uses TL2
Match model to your RAM - 0.7B for <1GB, 2B for 1-2GB, 7B+ for >2GB
Use I2_S when available - it works on both architectures
Start with the official 2B model - best balance for beginners
Upgrade to 7B+ for quality - if you have the RAM and patience

The official BitNet-b1.58-2B-4T model is the safest choice for most users. Community models like Falcon3 and Llama3-8B offer alternatives with different trade-offs between size and quality.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!