Skip to content

How to Build a Custom AI Agent with Claude SDK and OpenClaw?

Problem

I was burning through Claude API credits at an alarming rate. My custom AI agent project consumed $50+ per week just for routine code generation tasks:

Weekly API usage:
- Complex reasoning tasks: $15
- Code generation: $25
- Simple queries: $10+
Total: $50+/week = $200+/month

I needed Claude’s reasoning capabilities for complex tasks, but I didn’t want to pay API prices for simple code generation. There had to be a better way.

Environment

  • Claude API subscription (paying per token)
  • Custom AI agent project
  • Python-based development
  • Looking for cost optimization
  • Target: Reduce costs by 80%+

What I Tried First

My initial approach was simple: use Claude API for everything.

initial_agent.py
from anthropic import Anthropic
client = Anthropic()
def run_agent(prompt: str):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": prompt}]
)
return response.content

This worked, but the costs added up quickly:

Day 1: $8 (simple code tasks)
Day 2: $12 (complex reasoning)
Day 3: $7 (routine queries)
Week 1: $52

The problem? I was using Claude’s most expensive model for tasks that didn’t require it.

The Blueprint Approach

I found a Reddit discussion about using repositories as blueprints instead of dependencies. The key insight:

Don’t import these repos as dependencies. Use them as architectural references.

project-folder/
├── openclaw/ # Architecture reference
├── nemoClaw/ # NVIDIA's agent patterns
├── claude-agent-sdk/ # Official SDK patterns
└── ollama-sdk/ # Local model integration

Each repo showed me different approaches to agent architecture:

OpenClaw: Simple tool calling, lightweight
NemoClaw: Multi-agent orchestration
Claude Agent SDK: Official patterns from Anthropic
Ollama SDK: Local model integration

Step 1: Set Up the Blueprint Repositories

First, I cloned the repositories as references:

Terminal
# Create project folder
mkdir custom-agent && cd custom-agent
# Clone as blueprints (not dependencies!)
git clone https://github.com/ggozad/openclaw.git blueprints/openclaw
git clone https://github.com/NVIDIA/NeMo-Agent-Toolkit.git blueprints/nemoClaw
git clone https://github.com/anthropics/claude-agent-sdk.git blueprints/claude-agent-sdk
# These are REFERENCE ONLY - not imported

The folder structure:

custom-agent/
├── blueprints/
│ ├── openclaw/ # Study this for tool patterns
│ ├── nemoClaw/ # Study this for orchestration
│ └── claude-agent-sdk/ # Study this for Claude integration
└── src/
└── my_agent.py # Your implementation

Step 2: Understand the Hybrid Architecture

After studying the blueprints, I designed a hybrid approach:

┌─────────────────────────────────────────────────────┐
│ User Request │
└─────────────────────┬───────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Task Router (Custom Logic) │
│ Analyze: Is this complex reasoning or code task? │
└─────────┬───────────────────────────┬───────────────┘
│ │
Complex Reasoning Code Generation
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Claude API │ │ Ollama Qwen │
│ (Pro/Max token) │ │ (Local Model) │
│ Expensive │ │ Free │
└──────────────────┘ └──────────────────┘

This routing logic was the key insight from the blueprints.

Step 3: Implement the Custom Agent

Based on the blueprint patterns, I built my agent:

custom_agent.py
import os
from anthropic import Anthropic
import requests
class HybridAgent:
"""
A custom agent that routes tasks between Claude and local models.
Architecture inspired by Claude Agent SDK and OpenClaw blueprints.
"""
def __init__(self, claude_token: str, ollama_model: str = "qwen2.5-coder:7b"):
self.claude_client = Anthropic(api_key=claude_token)
self.ollama_model = ollama_model
self.ollama_url = "http://localhost:11434/api/generate"
def _is_complex_reasoning(self, prompt: str) -> bool:
"""Determine if task needs Claude's reasoning."""
complex_keywords = [
"analyze", "explain why", "design", "architecture",
"compare", "evaluate", "strategy", "decision"
]
return any(kw in prompt.lower() for kw in complex_keywords)
def _call_claude(self, prompt: str) -> str:
"""Use Claude for complex reasoning."""
response = self.claude_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def _call_ollama(self, prompt: str) -> str:
"""Use local Qwen for code tasks."""
response = requests.post(
self.ollama_url,
json={
"model": self.ollama_model,
"prompt": prompt,
"stream": False
}
)
return response.json().get("response", "")
def execute(self, prompt: str) -> str:
"""Route to appropriate model based on task type."""
if self._is_complex_reasoning(prompt):
print("Using Claude (complex reasoning)...")
return self._call_claude(prompt)
else:
print("Using Ollama (code task)...")
return self._call_ollama(prompt)
# Usage
agent = HybridAgent(claude_token=os.environ.get("ANTHROPIC_API_KEY"))
# Complex task -> Claude
result1 = agent.execute("Analyze the trade-offs between microservices and monolith architecture")
# Code task -> Ollama (free!)
result2 = agent.execute("Write a Python function to calculate fibonacci numbers")

Step 4: Set Up Ollama with Qwen Coder

Install and configure the local model:

Terminal
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull Qwen Coder (optimized for code generation)
ollama pull qwen2.5-coder:7b
# Verify it works
ollama run qwen2.5-coder "Write a hello world in Python"

The output:

>>> Write a hello world in Python
Here's a simple hello world program in Python:
print("Hello, World!")
This will output: Hello, World!

Step 5: Test the Cost Reduction

I ran a week-long comparison:

Before (all Claude API):
- Total cost: $52
- All tasks processed by Claude
After (hybrid approach):
- Claude costs: $12 (only complex reasoning)
- Ollama costs: $0 (local, free)
- Total cost: $12
Savings: $40/week = 77% reduction!

The breakdown:

Task Type | Before | After | Savings
-------------------|------------|------------|--------
Code generation | $25 | $0 | $25
Simple queries | $10 | $0 | $10
Complex reasoning | $15 | $12 | $3
Other tasks | $2 | $0 | $2
-------------------|------------|------------|--------
Total | $52 | $12 | $40

Why This Works

The blueprint approach taught me three key lessons:

1. Repositories as Architecture Guides, Not Dependencies

When I first tried to use OpenClaw as a dependency, I got conflicts:

ImportError: cannot import name 'Agent' from 'claw'
Version conflicts with existing packages

The solution was to read the source code, understand the patterns, and implement my own version:

WRONG:
from openclaw import Agent # Dependency approach
RIGHT:
# Study openclaw/agent.py
# Understand the tool calling pattern
# Implement your own version in my_agent.py

2. Task Routing is Critical

Not all tasks need Claude. The blueprints showed me how to analyze task complexity:

Needs Claude (complex):
- Architecture decisions
- Multi-step reasoning
- Ambiguous requirements
- Code review and analysis
Works with Local Model (simple):
- Code generation from clear specs
- Format conversions
- Simple queries
- Routine documentation

3. Token Management Matters

Claude tokens expire. I added refresh logic:

token_manager.py
import time
from datetime import datetime, timedelta
class TokenManager:
"""Manage Claude token lifecycle."""
def __init__(self, token: str, refresh_interval_hours: int = 8):
self.token = token
self.last_refresh = datetime.now()
self.refresh_interval = timedelta(hours=refresh_interval_hours)
def get_token(self) -> str:
"""Get valid token, refresh if needed."""
if datetime.now() - self.last_refresh > self.refresh_interval:
self._refresh_token()
return self.token
def _refresh_token(self):
"""Refresh token from environment or auth service."""
# Implementation depends on your auth setup
self.token = os.environ.get("ANTHROPIC_API_KEY")
self.last_refresh = datetime.now()

Common Mistakes I Made

Mistake 1: Using Repos as Dependencies

BAD: pip install openclaw # Adds unnecessary complexity
GOOD: Read the code, understand patterns, implement yourself

Mistake 2: Over-Engineering the Router

My first router was too complex:

# OVERLY COMPLEX
def route_task(prompt):
# 50 lines of ML classification
# Sentiment analysis
# Topic modeling
# ...
return model_choice
# SIMPLE AND EFFECTIVE
def route_task(prompt):
if any(kw in prompt.lower() for kw in COMPLEX_KEYWORDS):
return "claude"
return "ollama"

Mistake 3: Not Testing Local Model Quality

Qwen Coder works well for Python, but struggled with Rust:

Task: "Write a Rust async function"
Ollama output: Syntax errors, outdated patterns
Claude output: Correct, idiomatic Rust code
Solution: Route Rust tasks to Claude, Python tasks to Ollama

Alternative: Using Claude Pro/Max Tokens

Some Reddit users suggested using Claude Pro/Max subscription tokens instead of API keys:

API pricing:
- $3 per million input tokens
- $15 per million output tokens
Pro/Max subscription:
- $20-40/month flat rate
- Unlimited usage within limits
- Works for personal projects

This approach requires extracting tokens from the web interface. I haven’t implemented this personally, but it’s another cost optimization path.

Summary

In this post, I explained how to build a custom AI agent that reduces costs by 77% using a hybrid approach:

  1. Use blueprints, not dependencies: Study OpenClaw, NemoClaw, and Claude Agent SDK for architectural patterns, then implement your own version.

  2. Route tasks intelligently: Send complex reasoning to Claude, code generation to local models like Ollama Qwen.

  3. Start simple: A keyword-based router works surprisingly well. Add complexity only when needed.

  4. Test local model quality: Qwen Coder excels at Python but may struggle with other languages.

The key insight: you don’t need Claude for everything. Match the model to the task complexity.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments