Skip to content

Ollama vs LM Studio vs Local AI Coding Tools: Best Setup for Running Agentic Models on Mac

I was deep into a refactoring session with Claude Code when my API quota ran out. Again. That’s when I started looking seriously at local models — not as a toy, but as a daily driver for real coding work. The problem is the landscape is fragmented: Ollama, LM Studio, llama.cpp, MLX, GPT4All, LocalAI. Each one claims to be the best, and none of them fully replicate the agentic loop Claude Code gives you.

After a few weeks of swapping backends and testing on my M5 Max 128GB, here’s what I found.

Four local AI coding tools compared: Ollama for simple setup, LM Studio for GUI and agentic server, MLX for peak Apple Silicon performance, Aider for agentic terminal coding.

Why local models still can’t fully replace cloud coding AI

The core gap is agentic behavior. Most local setups handle tab completion and chat just fine — Continue.dev with Ollama gave me decent inline suggestions on day one. But the moment you ask for “refactor this module, update all imports, and write tests”, things fall apart. Local models lack reliable tool-calling, multi-step planning, and git-aware editing.

Cloud services like Claude Code or GitHub Copilot agent mode work because they run massive models with purpose-built orchestration layers. Running a 70B model locally gets you close on raw reasoning, but you still need the right tooling to wire up the agent loop.

The tool comparison

I tested five approaches. Here’s how they stack up:

ToolBest ForAgentic SupportMac Performance
OllamaQuick setup, model management, IDE integrationLimited (via Continue.dev)Good (llama.cpp backend)
LM StudioGUI, server mode, model testingBetter (OpenAI-compatible API)Excellent (MLX support)
MLXMax performance on Apple SiliconMinimal (low-level)Best (Apple native)
llama.cppRaw performance, customizationVia server modeGood
AiderAgentic coding in terminalStrongest (multi-file editing, git integration)Good (any backend)

Ollama — simplest entry point

Ollama is the easiest to get started with. One brew install ollama and you’re pulling models by name. The ollama pull deepseek-coder-v2:72b-instruct-q4_K_M workflow is dead simple. It’s built on llama.cpp under the hood, so you get solid performance without tweaking.

Where it falls short: agentic support. Ollama’s API is basic. You can hook it into Continue.dev for chat and completion, but don’t expect it to orchestrate multi-file edits on its own.

LM Studio — the agentic contender

LM Studio surprised me. Its built-in server mode exposes an OpenAI-compatible API with tool-calling support. That means you can point any tool that speaks the OpenAI format — Aider, Open Interpreter, custom scripts — at your local model and get semi-agentic behavior.

On M5 Max, LM Studio with MLX backend runs Qwen2.5-Coder 32B at usable speed (~20 tok/s). For a 72B model, you’ll want 4-bit quantization and some patience, but it works.

MLX — peak Apple Silicon performance

MLX is Apple’s machine learning framework, and it’s fast. Really fast. Running a model through MLX instead of llama.cpp on the same hardware gave me roughly 30% more tokens per second. The catch: MLX is a framework, not a user-friendly tool. You need to write Python scripts to load and run models. Unless you’re optimizing every last bit of throughput, LM Studio (which uses MLX under the hood) is the better bet.

Aider — closest to Claude Code in a terminal

Aider is the only tool in this list that gives me real agentic coding. It understands your git repo, makes multi-file changes, and commits them with sensible messages. It supports --architect mode for planning before editing. I connected Aider to a local Ollama backend:

aider-local.sh
aider --model ollama/deepseek-coder-v2:72b-instruct-q4_K_M \
--architect \
--no-stream

The --architect mode is key: the model first describes what it plans to change, then implements it. With a 72B model, the quality is surprisingly close to what I get from cloud services. The main downside — context window. Most local models max out at 32K-128K, so large codebases need careful file selection.

What I run daily

After all the testing, I settled on a hybrid stack:

  • Backend: LM Studio with Qwen2.5-Coder 32B (Q4) for agentic tasks, Ollama with a 7B model for fast autocomplete
  • IDE: Continue.dev extension in VS Code, configured with both backends — Ollama for inline completion, LM Studio for chat
  • Terminal: Aider wired to LM Studio’s API endpoint for big refactoring sessions
  • API endpoint: http://localhost:1234/v1 — LM Studio serves OpenAI-compatible endpoints that every tool understands

Architecture diagram of the recommended local AI coding stack: LM Studio serves as the agentic backend for Aider and IDE chat, Ollama handles fast autocomplete, Continue.dev in VS Code integrates both, all communicating via OpenAI-compatible local endpoints.

lm-studio-agent.py
import openai
client = openai.OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
response = client.chat.completions.create(
model="deepseek-coder-v2",
messages=[{"role": "user", "content": "Add error handling to this function..."}],
tools=[...], # LM Studio supports tool calling
)

Models worth trying on M5 Max 128GB

Your hardware dictates your options. Here’s what I’ve tested and what I’d recommend:

ModelSizeQuantizationRAMVerdict
DeepSeek Coder V2 72B72BQ4~45GBBest coding ability overall
Qwen2.5-Coder 32B32BQ4~20GBFast, solid for single-file tasks
CodeLLaMA 70B70BQ4~40GBReliable, well-tested
Yi-Coder 34B34BQ4~21GBStrong Chinese + English coding
Command R+ 104B104BQ3~55GBGood for complex reasoning

If you have 128GB of unified memory, the 70B-72B range is the sweet spot. Drop to 32B if you want speed, jump to 104B if you need reasoning depth and can tolerate slower responses.

The trade-off you need to accept

No local setup fully matches Claude Code’s agentic loop today. The gap is in reliable tool execution, long context handling, and the polish of a purpose-built orchestration layer. But for day-to-day coding — completion, chat, single-file edits, and even moderate refactoring — a well-configured local stack gets you 80% of the way there.

The bigger benefit? No quotas, no data leaving your machine, and no subscription fees. After two weeks of daily use, I cancelled one of my cloud AI subscriptions. For me, that trade-off is worth it.

Summary

In this post, I compared Ollama, LM Studio, MLX, and Aider for running local coding LLMs on Mac. My recommendation: Ollama + Continue.dev for daily coding simplicity, LM Studio + Aider for agentic workflows, and MLX only if you need maximum throughput on Apple Silicon.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments