Skip to content

Can You Run DeepSeek V4 Locally on a MacBook?

Problem

I want to run frontier AI models locally. But DeepSeek V4 Pro is 865GB on Hugging Face. That’s way beyond my MacBook’s RAM. So I asked: can I actually run this thing locally?

What I Found

The answer is: yes, but with limitations.

DeepSeek V4 has two variants:

DeepSeek V4 Model Sizes
Model | Total Size | Total Parameters | Active Parameters
-------------------|------------|------------------|-------------------
DeepSeek V4 Pro | 865GB | 1.6T | 49B
DeepSeek V4 Flash | 160GB | 284B | 13B

Flash (160GB) can potentially run on a 128GB MacBook with light quantization. Pro (865GB) is harder, but possible with streaming.

Why Local Inference Is Possible

DeepSeek V4 uses Mixture of Experts (MoE) architecture. During inference, only a fraction of parameters are active:

MoE Architecture Diagram
┌─────────────────────────────────────────┐
│ DeepSeek V4 Pro (1.6T) │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │E1 │ │E2 │ │E3 │ │E4 │ │E5 │ │...│ │
│ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ │
│ ↓ │
│ Only 49B active at inference │
└─────────────────────────────────────────┘

You don’t need all 1.6T parameters in RAM. You only need the active experts.

For Pro, you could stream just the necessary active experts from disk. This trades speed for feasibility.

My Plan for Local Deployment

Step 1: Try via OpenRouter first

Test DeepSeek V4 via OpenRouter
llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-flash 'Test prompt'

This requires no local hardware.

Step 2: Wait for Unsloth quantized releases

The Unsloth team typically releases quantized versions quickly:

Expected Quantized Sizes
Original Flash: 160GB
4-bit quantized: ~40GB (might fit in 48GB RAM)
8-bit quantized: ~80GB (needs 96GB+ RAM)

A lightly quantized Flash should run on a 128GB M5 MacBook Pro.

Step 3: Flash model on Hugging Face

Hugging Face Locations
DeepSeek V4 Pro: huggingface.co/deepseek-ai/deepseek-v4-pro (865GB)
DeepSeek V4 Flash: huggingface.co/deepseek-ai/deepseek-v4-flash (160GB)
Unloth models: huggingface.co/unsloth/models

Why Local Inference Matters

Local inference offers:

  • Privacy: No data leaves your machine
  • No rate limits: Use as much as you want
  • Zero per-token costs: Pay only for hardware
  • Offline access: Works without internet

For developers with powerful Macs, DeepSeek V4 Flash is accessible.

Common Mistake

I initially assumed I needed the full model size in RAM. That’s wrong for MoE models. You only need the active experts. Streaming from disk is viable, though slower.

Summary

In this post, I explained whether DeepSeek V4 can run locally on a MacBook. The key point is that MoE architecture means you don’t need all parameters in RAM. Flash with quantization should work on 128GB. Watch Unsloth for optimized releases.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments