Ollama vs LM Studio vs Local AI Coding Tools: Best Setup for Running Agentic Models on Mac
I was deep into a refactoring session with Claude Code when my API quota ran out. Again. That’s when I started looking seriously at local models — not as a toy, but as a daily driver for real coding work. The problem is the landscape is fragmented: Ollama, LM Studio, llama.cpp, MLX, GPT4All, LocalAI. Each one claims to be the best, and none of them fully replicate the agentic loop Claude Code gives you.
After a few weeks of swapping backends and testing on my M5 Max 128GB, here’s what I found.

Why local models still can’t fully replace cloud coding AI
The core gap is agentic behavior. Most local setups handle tab completion and chat just fine — Continue.dev with Ollama gave me decent inline suggestions on day one. But the moment you ask for “refactor this module, update all imports, and write tests”, things fall apart. Local models lack reliable tool-calling, multi-step planning, and git-aware editing.
Cloud services like Claude Code or GitHub Copilot agent mode work because they run massive models with purpose-built orchestration layers. Running a 70B model locally gets you close on raw reasoning, but you still need the right tooling to wire up the agent loop.
The tool comparison
I tested five approaches. Here’s how they stack up:
| Tool | Best For | Agentic Support | Mac Performance |
|---|---|---|---|
| Ollama | Quick setup, model management, IDE integration | Limited (via Continue.dev) | Good (llama.cpp backend) |
| LM Studio | GUI, server mode, model testing | Better (OpenAI-compatible API) | Excellent (MLX support) |
| MLX | Max performance on Apple Silicon | Minimal (low-level) | Best (Apple native) |
| llama.cpp | Raw performance, customization | Via server mode | Good |
| Aider | Agentic coding in terminal | Strongest (multi-file editing, git integration) | Good (any backend) |
Ollama — simplest entry point
Ollama is the easiest to get started with. One brew install ollama and you’re pulling models by name. The ollama pull deepseek-coder-v2:72b-instruct-q4_K_M workflow is dead simple. It’s built on llama.cpp under the hood, so you get solid performance without tweaking.
Where it falls short: agentic support. Ollama’s API is basic. You can hook it into Continue.dev for chat and completion, but don’t expect it to orchestrate multi-file edits on its own.
LM Studio — the agentic contender
LM Studio surprised me. Its built-in server mode exposes an OpenAI-compatible API with tool-calling support. That means you can point any tool that speaks the OpenAI format — Aider, Open Interpreter, custom scripts — at your local model and get semi-agentic behavior.
On M5 Max, LM Studio with MLX backend runs Qwen2.5-Coder 32B at usable speed (~20 tok/s). For a 72B model, you’ll want 4-bit quantization and some patience, but it works.
MLX — peak Apple Silicon performance
MLX is Apple’s machine learning framework, and it’s fast. Really fast. Running a model through MLX instead of llama.cpp on the same hardware gave me roughly 30% more tokens per second. The catch: MLX is a framework, not a user-friendly tool. You need to write Python scripts to load and run models. Unless you’re optimizing every last bit of throughput, LM Studio (which uses MLX under the hood) is the better bet.
Aider — closest to Claude Code in a terminal
Aider is the only tool in this list that gives me real agentic coding. It understands your git repo, makes multi-file changes, and commits them with sensible messages. It supports --architect mode for planning before editing. I connected Aider to a local Ollama backend:
aider --model ollama/deepseek-coder-v2:72b-instruct-q4_K_M \ --architect \ --no-streamThe --architect mode is key: the model first describes what it plans to change, then implements it. With a 72B model, the quality is surprisingly close to what I get from cloud services. The main downside — context window. Most local models max out at 32K-128K, so large codebases need careful file selection.
What I run daily
After all the testing, I settled on a hybrid stack:
- Backend: LM Studio with Qwen2.5-Coder 32B (Q4) for agentic tasks, Ollama with a 7B model for fast autocomplete
- IDE: Continue.dev extension in VS Code, configured with both backends — Ollama for inline completion, LM Studio for chat
- Terminal: Aider wired to LM Studio’s API endpoint for big refactoring sessions
- API endpoint:
http://localhost:1234/v1— LM Studio serves OpenAI-compatible endpoints that every tool understands

import openai
client = openai.OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")response = client.chat.completions.create( model="deepseek-coder-v2", messages=[{"role": "user", "content": "Add error handling to this function..."}], tools=[...], # LM Studio supports tool calling)Models worth trying on M5 Max 128GB
Your hardware dictates your options. Here’s what I’ve tested and what I’d recommend:
| Model | Size | Quantization | RAM | Verdict |
|---|---|---|---|---|
| DeepSeek Coder V2 72B | 72B | Q4 | ~45GB | Best coding ability overall |
| Qwen2.5-Coder 32B | 32B | Q4 | ~20GB | Fast, solid for single-file tasks |
| CodeLLaMA 70B | 70B | Q4 | ~40GB | Reliable, well-tested |
| Yi-Coder 34B | 34B | Q4 | ~21GB | Strong Chinese + English coding |
| Command R+ 104B | 104B | Q3 | ~55GB | Good for complex reasoning |
If you have 128GB of unified memory, the 70B-72B range is the sweet spot. Drop to 32B if you want speed, jump to 104B if you need reasoning depth and can tolerate slower responses.
The trade-off you need to accept
No local setup fully matches Claude Code’s agentic loop today. The gap is in reliable tool execution, long context handling, and the polish of a purpose-built orchestration layer. But for day-to-day coding — completion, chat, single-file edits, and even moderate refactoring — a well-configured local stack gets you 80% of the way there.
The bigger benefit? No quotas, no data leaving your machine, and no subscription fees. After two weeks of daily use, I cancelled one of my cloud AI subscriptions. For me, that trade-off is worth it.
Summary
In this post, I compared Ollama, LM Studio, MLX, and Aider for running local coding LLMs on Mac. My recommendation: Ollama + Continue.dev for daily coding simplicity, LM Studio + Aider for agentic workflows, and MLX only if you need maximum throughput on Apple Silicon.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: Best local setup for replacing cloud coding AI
- 👨💻 Ollama Official
- 👨💻 LM Studio
- 👨💻 Aider - AI pair programming in terminal
- 👨💻 MLX - Apple machine learning framework
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments