Skip to content

What Features Make a Good AI Coding Harness?

I spent weeks testing different AI coding assistants. Same models, different results. Why?

Developer workspace with architecture planning

Photo by Unsplash

The answer surprised me: the harness matters more than the model. I was leaving 20+ performance points on the table because I chose tools that “had features” but implemented them poorly.

The Problem: Feature Checkboxes Lie

When I compared AI coding tools, I looked at feature lists. “Has LSP integration? Check. Subagent support? Check. MCP compatibility? Check.”

But having a feature isn’t the same as having a good feature.

I noticed this when my edits kept failing on one tool despite it claiming hash-anchored edit support. The implementation was so basic that any file modification broke it. Meanwhile, another tool with the same feature worked reliably across complex refactoring sessions.

Hash-Anchored Edits: Precision Over Fragility

I learned this lesson the hard way. Traditional line-based edits break when files change:

// Traditional approach (what I used first):
// Line 45: Change "foo" to "bar"
// Problem: What if someone added code above line 45?

Hash-anchored edits use content hashes to locate code precisely. The edit survives reformatting, line additions, and other modifications:

hash_anchor_implementation.py
import hashlib
def compute_anchor(content: str) -> str:
"""Compute content hash for anchoring edits."""
return hashlib.sha256(content.encode()).hexdigest()[:8]
def apply_anchored_edit(
file_path: str,
anchor: str,
old_content: str,
new_content: str
) -> bool:
"""Apply edit using content hash anchor."""
with open(file_path, 'r') as f:
content = f.read()
# Verify anchor matches expected location
current_anchor = compute_anchor(old_content)
if current_anchor != anchor:
raise ValueError("Content drift detected, anchor invalid")
# Apply edit
updated = content.replace(old_content, new_content)
with open(file_path, 'w') as f:
f.write(updated)
return True

This approach prevented my edits from failing during multi-file refactoring. The anchor validates the context before applying changes.

LSP Integration: Semantic Understanding

My early AI coding sessions felt like working with someone who only read the file text. No understanding of imports, types, or project structure.

LSP (Language Server Protocol) integration changed this. The AI now understands:

  • Go-to-definition: Navigates across files correctly
  • Type checking: Catches errors before applying edits
  • Autocomplete context: Knows what methods exist on objects
lsp_validation.ts
interface LSPEditValidation {
valid: boolean;
errors: Diagnostic[];
suggestions: CodeAction[];
}
async function validateEditWithLSP(
file: string,
edit: TextEdit
): Promise<LSPEditValidation> {
// Get pre-edit diagnostics
const beforeDiags = await lsp.getDiagnostics(file);
// Preview the edit
const previewDoc = applyEditPreview(file, edit);
// Get post-edit diagnostics
const afterDiags = await lsp.getDiagnostics(previewDoc);
// Check for new errors
const newErrors = afterDiags.filter(d =>
d.severity === 'error' &&
!beforeDiags.some(b => sameDiagnostic(b, d))
);
return {
valid: newErrors.length === 0,
errors: newErrors,
suggestions: await lsp.getCodeActions(file, edit.range)
};
}

This validation loop catches type errors the AI might introduce. I can rollback before the bad edit reaches my codebase.

Persistent IPython Kernel: State Across Turns

I spent hours re-importing the same libraries and reloading the same variables each time I asked the AI to run Python code. Each turn started fresh—no state preserved.

A persistent kernel solves this:

┌─────────────┐ ┌──────────────────┐
│ AI Agent │────▶│ IPython Kernel │
└─────────────┘ │ (Persistent) │
│ │ - State kept │
│ │ - Variables │
▼ │ - Imports │
┌─────────────┐ └──────────────────┘
│ Turns │ │
│ 1,2,3... │◀─────────────┘
└─────────────┘ State persists

My debugging sessions became coherent. I could explore a problem step-by-step without restarting each time.

Proper Subagent Support: Handling Complexity

Single agents struggle with complex tasks. I watched my AI try to simultaneously plan architecture, write frontend code, write backend code, and generate tests—all in one thread.

The results were messy. Context overflow. Incomplete implementations.

Subagent support lets me decompose tasks:

subagent_orchestration.ts
const plan = await plannerAgent.analyze(task);
const results = await Promise.all([
codeAgent.implement(plan.frontend),
codeAgent.implement(plan.backend),
testAgent.generateTests(plan)
]);

Or with Python and dependency resolution:

subagent_dependency.py
from dataclasses import dataclass
from typing import List
import asyncio
@dataclass
class SubagentTask:
agent_type: str
task: str
dependencies: List[str] = None
class SubagentOrchestrator:
def __init__(self):
self.agents = {}
self.results = {}
async def execute(self, tasks: List[SubagentTask]):
"""Execute subagent tasks with dependency resolution."""
while tasks:
# Find tasks with satisfied dependencies
ready = [
t for t in tasks
if self._dependencies_met(t)
]
if not ready:
raise RuntimeError("Circular dependency detected")
# Execute ready tasks in parallel
coros = [
self._run_agent(t.agent_type, t.task)
for t in ready
]
results = await asyncio.gather(*coros)
# Store results and remove completed
for task, result in zip(ready, results):
self.results[task.task] = result
tasks.remove(task)
def _dependencies_met(self, task: SubagentTask) -> bool:
if not task.dependencies:
return True
return all(d in self.results for d in task.dependencies)

This pattern handles complex workflows. Each agent specializes, and dependencies resolve automatically.

Turn Injection: Guiding Without Drift

I noticed my AI sessions drifted. The agent would start focused, then gradually lose track of the original goal.

Strategic prompt injection between turns corrects this:

turn_context.ts
function buildTurnContext(history, currentTask) {
return [
...history,
{
role: 'system',
content: generateContextualPrompt(currentTask, codebaseState)
}
];
}

This injection reminds the agent of constraints, coding standards, and the current task focus.

What I Learned from the Comparison

FeatureWithout ItWith ItWhat I Saw
Hash-anchored editsEdits fail on file changesReliable refactoring+15% edit success
LSP integrationSyntax-only guessesSemantic awareness+20% accuracy
Persistent kernelRe-run imports each turnState maintained+30% efficiency
Subagent supportSingle-threaded chaosParallel specialization+25% speed
Turn injectionContext driftGuided execution+10% consistency

The Comparison That Changed My View

HarnessLSPHash-EditKernelSubagentsMCP
Claude CodeYesYesYesYesYes
CursorYesPartialNoLimitedNo
ContinueYesNoNoNoYes
PiBasicNoNoNoNo

The pattern: “Having” a feature checkbox doesn’t mean the feature works well.

How I Evaluate Harnesses Now

  1. Test the feature, not read the list: I try hash-anchored edits on a modified file. I test LSP go-to-definition across imports.

  2. Measure performance: Edit success rate, task completion time, context window usage.

  3. Check implementation quality: Does the kernel actually persist? Do subagents communicate properly?

  4. Run a real task: A simple “add a feature” test reveals more than feature comparisons.

Conclusion

The harness determines what you get from the model. I spent too long assuming all tools were equivalent because they used similar LLMs and exposed similar interfaces.

Five features matter: hash-anchored edits for precision, LSP integration for semantic understanding, persistent kernels for state, subagent support for complexity, and turn injection for guidance.

When I switched to a harness with proper implementations of these features, my coding sessions improved dramatically. Same model, better results. The architecture was the difference.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments