What Is an AI Coding Harness and Why Are Developers Building Their Own?

Jun 27, 2026

Problem

I kept seeing the word “harness” in r/ClaudeAI, on Twitter, in launch posts for tools like Claude Code, Pi, Hermes, and OpenClaw. The most-upvoted comment in the thread that started all this was plain:

I still have no clue what a harness is. — u/NGTech9

Same. The term exploded in 2025–2026 and got reused for every wrapper people wrote on top of a coding model. There is no formal spec. So I went through one thread — “Anyone building their own harness?” (72 comments) — and worked out the pattern. This post is the version I wish I had read first.

The Five Layers a Harness Wraps

In practice, every coding agent is made of the same five layers. Calling something a “harness” usually means a developer has opinions about at least one of them and decided to implement their own version.

Context manager — what history, files, tool results, and instructions are sent on each turn. This is the layer that decides model quality.
Tool / permission system — which shell commands, file edits, network calls, and MCP tools the agent can invoke, and which require human approval.
Loop / scheduler — how the model is driven through multi-step tasks: think-act-observe, retries, branch-and-merge, parallel sub-agents.
Provider / model adapter — how to talk to Anthropic, OpenAI, Google, or a local model (Gemma, Llama, Qwen) without rewriting everything.
UI — terminal TUI, web UI (the OP literally cloned claude.ai), Telegram bot, or even a gamepad-driven interface.

Stacked-layers diagram showing the five layers of an AI coding harness: context manager on top, then tool/permission system, loop/scheduler, provider/model adapter, and UI at the bottom

Off-the-shelf products give you a default for every one of those. Rolling your own means you stop accepting defaults you don’t believe in.

A Harness Is a Product Decision, Not a Framework

A harness encodes a developer’s answers to questions like:

How much context do I trust the model with before it gets confused?
Do I want the agent to be able to push to git, or only stage?
Do I want it to spawn parallel sub-agents, or one linear loop I can interrupt?
Do I want this to work on a phone, in Telegram, or only in a terminal?
Do I want it locked to one provider, or swappable?

The thread is full of named custom projects that are basically answer-sets: OpenAnton, Buford (Rust), Flawed Code (no-intervene loop), Wolffi.sh, lumina, gamepad-cli-hub, visual-relay, Speedwave, BetterCode, laserclaudeflower. None of them are products. They are decisions.

Architecture diagram comparing Hermes WebUI, Claude Code, and OpenClaw — self-hosted vs cloud components

Why People Build Their Own

The single sharpest framing in the thread:

The real unlock of rolling your own isn’t the ui, it’s that you control exactly what goes into context each turn. The prebuilt harnesses all over-stuff. — u/agiblox

That same motivation shows up under four different labels:

Quality ceiling. The model’s output is bounded by what fits in its context. Whichever harness controls that input best wins.
Portability. A custom harness is the only way to keep working when the underlying model changes, gets deprecated, or becomes unaffordable.
Safety and compliance. The Speedwave team built a banking-grade harness specifically because the model should never see credentials — a constraint you can’t enforce in a general-purpose client.
Workflow fit. A long-horizon PR-diff loop that runs for “a day or 2” with a creative-bug-finding tail is not a feature in any off-the-shelf product.

Common Mistakes

From the thread, the patterns that waste the most time:

Treating the UI as the hard part. Multiple commenters pushed back: the UI is “never the hard part.” State and context are.
Building from zero when Pi exists. One commenter: “find it makes more sense to just build on top of pi’s foundation. That’s kind of the whole point, it’s barebones to allow for maximum customization.”
Chasing harnesses instead of picking one. One developer bounced between openclaw, Hermes, openjarvis, and only settled on OpenAnton after trying them all. Pick a base, then commit.
Over-shipping features before the loop is solid. One author of a large feature surface admitted: “I’m frankly thinking I invested too much on AI and it’s just awful at coding at the time.”

Where to Start

The thread settled on three stable starting points. The real question is not “which harness is best,” but “which default assumptions can I live with.” Here is the decision flow that came out of it:

Decision flowchart for choosing between Hermes WebUI, Claude Code, and OpenClaw

Fork Pi if you want maximum customization. Extend an existing harness if you want the loop, tools, and provider layer already done. Build greenfield only if the existing harnesses make an assumption you cannot live with.

Summary

In this post, I explained what an AI coding harness actually is — the context, tools, loop, provider, and UI layers around a model — and why so many developers in 2026 are building their own. The key point is that the real win is not the UI, it is controlling what goes into the context window on every turn.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: Anyone building their own harness?
👨‍💻 gamepad-cli-hub on GitHub
👨‍💻 lumina on GitHub
👨‍💻 visual-relay on GitHub
👨‍💻 speedwave on GitHub

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!