Can You Run DeepSeek V4 Locally on a MacBook?
Problem
I want to run frontier AI models locally. But DeepSeek V4 Pro is 865GB on Hugging Face. That’s way beyond my MacBook’s RAM. So I asked: can I actually run this thing locally?
What I Found
The answer is: yes, but with limitations.
DeepSeek V4 has two variants:
Model | Total Size | Total Parameters | Active Parameters-------------------|------------|------------------|-------------------DeepSeek V4 Pro | 865GB | 1.6T | 49BDeepSeek V4 Flash | 160GB | 284B | 13BFlash (160GB) can potentially run on a 128GB MacBook with light quantization. Pro (865GB) is harder, but possible with streaming.
Why Local Inference Is Possible
DeepSeek V4 uses Mixture of Experts (MoE) architecture. During inference, only a fraction of parameters are active:
┌─────────────────────────────────────────┐│ DeepSeek V4 Pro (1.6T) ││ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ││ │E1 │ │E2 │ │E3 │ │E4 │ │E5 │ │...│ ││ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ ││ ↓ ││ Only 49B active at inference │└─────────────────────────────────────────┘You don’t need all 1.6T parameters in RAM. You only need the active experts.
For Pro, you could stream just the necessary active experts from disk. This trades speed for feasibility.
My Plan for Local Deployment
Step 1: Try via OpenRouter first
llm install llm-openrouterllm openrouter refreshllm -m openrouter/deepseek/deepseek-v4-flash 'Test prompt'This requires no local hardware.
Step 2: Wait for Unsloth quantized releases
The Unsloth team typically releases quantized versions quickly:
Original Flash: 160GB4-bit quantized: ~40GB (might fit in 48GB RAM)8-bit quantized: ~80GB (needs 96GB+ RAM)A lightly quantized Flash should run on a 128GB M5 MacBook Pro.
Step 3: Flash model on Hugging Face
DeepSeek V4 Pro: huggingface.co/deepseek-ai/deepseek-v4-pro (865GB)DeepSeek V4 Flash: huggingface.co/deepseek-ai/deepseek-v4-flash (160GB)Unloth models: huggingface.co/unsloth/modelsWhy Local Inference Matters
Local inference offers:
- Privacy: No data leaves your machine
- No rate limits: Use as much as you want
- Zero per-token costs: Pay only for hardware
- Offline access: Works without internet
For developers with powerful Macs, DeepSeek V4 Flash is accessible.
Common Mistake
I initially assumed I needed the full model size in RAM. That’s wrong for MoE models. You only need the active experts. Streaming from disk is viable, though slower.
Summary
In this post, I explained whether DeepSeek V4 can run locally on a MacBook. The key point is that MoE architecture means you don’t need all parameters in RAM. Flash with quantization should work on 128GB. Watch Unsloth for optimized releases.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments