Skip to content

BitNet Supported Models: Complete Guide to 1-bit LLMs You Can Run

Problem

I downloaded BitNet and tried to run a model I found on Hugging Face. It didn’t work:

Model Not Supported Error
ValueError: Unsupported model architecture
KeyError: 'model_name' not found in SUPPORTED_HF_MODELS

Turns out, BitNet only supports specific models, and the model I picked wasn’t on the list. This guide covers all the models you can actually use with BitNet and which kernel works best for your CPU.

The Short Answer

BitNet currently supports 9 models from Hugging Face:

Supported Model List
Official Microsoft:
- microsoft/BitNet-b1.58-2B-4T (2.4B params) - Recommended
Community Models:
- 1bitLLM/bitnet_b1_58-large (0.7B params)
- 1bitLLM/bitnet_b1_58-3B (3.3B params)
- HF1BitLLM/Llama3-8B-1.58-100B-tokens (8.0B params)
- tiiuae/Falcon3-1B-Instruct-1.58bit
- tiiuae/Falcon3-7B-Instruct-1.58bit
- tiiuae/Falcon3-10B-Instruct-1.58bit
- tiiuae/Falcon-E-1B-Instruct
- tiiuae/Falcon-E-3B-Instruct

But there’s a catch: kernel support varies by CPU architecture. Let me explain.

Understanding Kernels

BitNet uses specialized kernels for inference, and not all kernels work on all CPUs:

Kernel Types Explained
i2_s - Integer 2-bit symmetric quantization
Good balance of speed and quality
Works on both x86 and ARM
tl1 - Ternary Lookup Table 1
Optimized for ARM processors
Uses lookup tables for faster computation
tl2 - Ternary Lookup Table 2
Optimized for x86 processors
Better performance on Intel/AMD CPUs

The key insight: ARM CPUs use TL1, x86 CPUs use TL2. Both support I2_S.

Official Microsoft Model

BitNet-b1.58-2B-4T

This is the model I recommend for beginners. Microsoft trained it on 4 trillion tokens, so it’s reasonably capable:

Model Specifications
Parameters: 2.4 billion
Training Tokens: 4 trillion
License: MIT
Context Length: 4096
Memory (i2_s): ~400 MB

Kernel Compatibility:

BitNet-b1.58-2B-4T Kernel Support
┌─────────────────────────────────────────────────────────┐
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ OK │ NO │ OK │ TL2 │
│ ARM │ OK │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘

I tested both kernels on my Mac (ARM) and found TL1 about 15% faster than I2_S for this model.

When to Choose the Official Model

Pick this model if you:

  • Are new to BitNet and want the “reference” experience
  • Have limited RAM (< 2GB available)
  • Need a model with MIT license (commercial use allowed)
  • Want the most community support and documentation

Community Models by Size

Small Models (0.7B - 1B Parameters)

These models run fast even on older hardware:

Small Model Comparison
┌─────────────────────────────────────────────────────────────┐
│ Model │ Params │ Memory │ License │
├─────────────────────────────────────────────────────────────┤
│ bitnet_b1_58-large │ 0.7B │ ~150MB │ MIT │
│ Falcon-E-1B-Instruct │ 1.0B │ ~200MB │ Apache │
└─────────────────────────────────────────────────────────────┘

I tested bitnet_b1_58-large on an old Raspberry Pi 4 (4GB RAM) - it worked, though slowly. Great for edge devices.

Kernel Compatibility for Small Models:

bitnet_b1_58-large Kernel Support
┌─────────────────────────────────────────────────────────┐
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ OK │ NO │ OK │ TL2 │
│ ARM │ OK │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Falcon-E-1B Kernel Support │
├─────────────────────────────────────────────────────────┤
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ OK │ NO │ OK │ TL2 │
│ ARM │ OK │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘

Medium Models (2B - 3B Parameters)

The sweet spot for most users:

Medium Model Comparison
┌─────────────────────────────────────────────────────────────┐
│ Model │ Params │ Memory │ License │
├─────────────────────────────────────────────────────────────┤
│ BitNet-b1.58-2B-4T │ 2.4B │ ~400MB │ MIT │
│ bitnet_b1_58-3B │ 3.3B │ ~550MB │ MIT │
│ Falcon-E-3B-Instruct │ 3.0B │ ~500MB │ Apache │
└─────────────────────────────────────────────────────────────┘

Important: The 3B model has different kernel support:

bitnet_b1_58-3B Kernel Support
┌─────────────────────────────────────────────────────────┐
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ NO │ NO │ OK │ TL2 │
│ ARM │ NO │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘

Notice that I2_S is NOT supported for the 3B model. You must use TL1 on ARM or TL2 on x86. I learned this the hard way when I tried to quantize with I2_S:

Error with Wrong Kernel
RuntimeError: I2_S kernel not supported for this model architecture

Large Models (7B - 10B Parameters)

For when you need better quality and have more RAM:

Large Model Comparison
┌─────────────────────────────────────────────────────────────┐
│ Model │ Params │ Memory │ License │
├─────────────────────────────────────────────────────────────┤
│ Llama3-8B-1.58-100B │ 8.0B │ ~1.3GB │ Llama │
│ Falcon3-7B-Instruct-1.58 │ 7.0B │ ~1.2GB │ Apache │
│ Falcon3-10B-Instruct-1.58│ 10.0B │ ~1.7GB │ Apache │
└─────────────────────────────────────────────────────────────┘

The Llama3-based model uses 100B training tokens - less than the official 4T, but the base Llama3 architecture is strong.

Kernel Compatibility for Large Models:

Llama3-8B-1.58 Kernel Support
┌─────────────────────────────────────────────────────────┐
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ OK │ NO │ OK │ TL2 │
│ ARM │ OK │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Falcon3 Family Kernel Support │
├─────────────────────────────────────────────────────────┤
│ CPU Type │ I2_S │ TL1 │ TL2 │ Best │
├─────────────────────────────────────────────────────────┤
│ x86 │ OK │ NO │ OK │ TL2 │
│ ARM │ OK │ OK │ NO │ TL1 │
└─────────────────────────────────────────────────────────┘

Complete Compatibility Matrix

Here’s the full picture at a glance:

Full Kernel Compatibility Matrix
I2_S TL1 TL2
Model x86 ARM x86 ARM x86 ARM
─────────────────────────────────────────────────────
BitNet-b1.58-2B-4T OK OK NO OK OK NO
bitnet_b1_58-large OK OK NO OK OK NO
bitnet_b1_58-3B NO NO NO OK OK NO
Llama3-8B-1.58 OK OK NO OK OK NO
Falcon3 family OK OK NO OK OK NO
Falcon-E family OK OK NO OK OK NO
─────────────────────────────────────────────────────

Choosing the Right Model

I use this decision flow:

Model Selection Guide
┌─────────────────┐
│ Your RAM limit? │
└────────┬────────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
< 1GB 1-2 GB > 2GB
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ 0.7B-1B │ │ 2B-3B │ │ 7B-10B │
│ models │ │ models │ │ models │
└────────────┘ └────────────┘ └────────────┘
│ │ │
▼ ▼ ▼
bitnet_large BitNet-2B Llama3-8B
Falcon-E-1B bitnet-3B Falcon3-10B

By Use Case

Use CaseRecommended ModelWhy
Quick testingbitnet_b1_58-largeFastest download and inference
General purposeBitNet-b1.58-2B-4TBest quality/size ratio
Better qualityFalcon3-7B-Instruct-1.58Larger context, more capable
Edge devicesFalcon-E-1B-InstructOptimized for low resource
Maximum qualityFalcon3-10B-Instruct-1.58Best results, needs 2GB+ RAM

Downloading Models

Use huggingface-cli to download models:

Download models from Hugging Face
# Install huggingface-hub first
pip install huggingface-hub
# Official model (recommended)
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
--local-dir models/BitNet-b1.58-2B-4T
# Large model
huggingface-cli download HF1BitLLM/Llama3-8B-1.58-100B-tokens \
--local-dir models/Llama3-8B-1.58
# Falcon3
huggingface-cli download tiiuae/Falcon3-7B-Instruct-1.58bit \
--local-dir models/Falcon3-7B

Resume Interrupted Downloads

If a download fails (common with large models):

Resume download
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
--local-dir models/BitNet-b1.58-2B-4T \
--resume-download

Running Different Models

After downloading, use setup_env.py with the correct quantization:

Setup and run different models
# Official 2B model with I2_S (works on all CPUs)
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
# 3B model - MUST use TL1 (ARM) or TL2 (x86)
python setup_env.py -md models/bitnet_b1_58-3B -q tl1 # ARM
python setup_env.py -md models/bitnet_b1_58-3B -q tl2 # x86
# Llama3-8B with I2_S
python setup_env.py -md models/Llama3-8B-1.58 -q i2_s

Then run inference:

Run inference
python run_inference.py \
-m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
-p "You are a helpful assistant" \
-cnv

Model Quality Comparison

I ran a simple test: asking each model to explain quantum computing in one paragraph. Here’s my subjective ranking:

Quality Comparison (1-5 scale)
Model Quality Speed Overall
───────────────────────────────────────────────────
BitNet-b1.58-2B-4T 4/5 5/5 4.5/5
bitnet_b1_58-large 2/5 5/5 3.5/5
bitnet_b1_58-3B 4/5 4/5 4.0/5
Llama3-8B-1.58 4/5 3/5 3.5/5
Falcon3-7B-Instruct-1.58 4/5 3/5 3.5/5
Falcon3-10B-Instruct-1.58 5/5 2/5 3.5/5

The 2B model hits the sweet spot for most tasks. The 10B model has the best quality but runs noticeably slower on CPU.

Summary

When choosing a BitNet model:

  1. Check your CPU architecture - ARM uses TL1, x86 uses TL2
  2. Match model to your RAM - 0.7B for <1GB, 2B for 1-2GB, 7B+ for >2GB
  3. Use I2_S when available - it works on both architectures
  4. Start with the official 2B model - best balance for beginners
  5. Upgrade to 7B+ for quality - if you have the RAM and patience

The official BitNet-b1.58-2B-4T model is the safest choice for most users. Community models like Falcon3 and Llama3-8B offer alternatives with different trade-offs between size and quality.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments