How to Build a Custom AI Agent with Claude SDK and OpenClaw?
Problem
I was burning through Claude API credits at an alarming rate. My custom AI agent project consumed $50+ per week just for routine code generation tasks:
Weekly API usage:- Complex reasoning tasks: $15- Code generation: $25- Simple queries: $10+Total: $50+/week = $200+/monthI needed Claude’s reasoning capabilities for complex tasks, but I didn’t want to pay API prices for simple code generation. There had to be a better way.
Environment
- Claude API subscription (paying per token)
- Custom AI agent project
- Python-based development
- Looking for cost optimization
- Target: Reduce costs by 80%+
What I Tried First
My initial approach was simple: use Claude API for everything.
from anthropic import Anthropic
client = Anthropic()
def run_agent(prompt: str): response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": prompt}] ) return response.contentThis worked, but the costs added up quickly:
Day 1: $8 (simple code tasks)Day 2: $12 (complex reasoning)Day 3: $7 (routine queries)Week 1: $52The problem? I was using Claude’s most expensive model for tasks that didn’t require it.
The Blueprint Approach
I found a Reddit discussion about using repositories as blueprints instead of dependencies. The key insight:
Don’t import these repos as dependencies. Use them as architectural references.
project-folder/├── openclaw/ # Architecture reference├── nemoClaw/ # NVIDIA's agent patterns├── claude-agent-sdk/ # Official SDK patterns└── ollama-sdk/ # Local model integrationEach repo showed me different approaches to agent architecture:
OpenClaw: Simple tool calling, lightweightNemoClaw: Multi-agent orchestrationClaude Agent SDK: Official patterns from AnthropicOllama SDK: Local model integrationStep 1: Set Up the Blueprint Repositories
First, I cloned the repositories as references:
# Create project foldermkdir custom-agent && cd custom-agent
# Clone as blueprints (not dependencies!)git clone https://github.com/ggozad/openclaw.git blueprints/openclawgit clone https://github.com/NVIDIA/NeMo-Agent-Toolkit.git blueprints/nemoClawgit clone https://github.com/anthropics/claude-agent-sdk.git blueprints/claude-agent-sdk
# These are REFERENCE ONLY - not importedThe folder structure:
custom-agent/├── blueprints/│ ├── openclaw/ # Study this for tool patterns│ ├── nemoClaw/ # Study this for orchestration│ └── claude-agent-sdk/ # Study this for Claude integration└── src/ └── my_agent.py # Your implementationStep 2: Understand the Hybrid Architecture
After studying the blueprints, I designed a hybrid approach:
┌─────────────────────────────────────────────────────┐│ User Request │└─────────────────────┬───────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────┐│ Task Router (Custom Logic) ││ Analyze: Is this complex reasoning or code task? │└─────────┬───────────────────────────┬───────────────┘ │ │ Complex Reasoning Code Generation │ │ ▼ ▼┌──────────────────┐ ┌──────────────────┐│ Claude API │ │ Ollama Qwen ││ (Pro/Max token) │ │ (Local Model) ││ Expensive │ │ Free │└──────────────────┘ └──────────────────┘This routing logic was the key insight from the blueprints.
Step 3: Implement the Custom Agent
Based on the blueprint patterns, I built my agent:
import osfrom anthropic import Anthropicimport requests
class HybridAgent: """ A custom agent that routes tasks between Claude and local models. Architecture inspired by Claude Agent SDK and OpenClaw blueprints. """
def __init__(self, claude_token: str, ollama_model: str = "qwen2.5-coder:7b"): self.claude_client = Anthropic(api_key=claude_token) self.ollama_model = ollama_model self.ollama_url = "http://localhost:11434/api/generate"
def _is_complex_reasoning(self, prompt: str) -> bool: """Determine if task needs Claude's reasoning.""" complex_keywords = [ "analyze", "explain why", "design", "architecture", "compare", "evaluate", "strategy", "decision" ] return any(kw in prompt.lower() for kw in complex_keywords)
def _call_claude(self, prompt: str) -> str: """Use Claude for complex reasoning.""" response = self.claude_client.messages.create( model="claude-sonnet-4-20250514", max_tokens=2048, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text
def _call_ollama(self, prompt: str) -> str: """Use local Qwen for code tasks.""" response = requests.post( self.ollama_url, json={ "model": self.ollama_model, "prompt": prompt, "stream": False } ) return response.json().get("response", "")
def execute(self, prompt: str) -> str: """Route to appropriate model based on task type.""" if self._is_complex_reasoning(prompt): print("Using Claude (complex reasoning)...") return self._call_claude(prompt) else: print("Using Ollama (code task)...") return self._call_ollama(prompt)
# Usageagent = HybridAgent(claude_token=os.environ.get("ANTHROPIC_API_KEY"))
# Complex task -> Clauderesult1 = agent.execute("Analyze the trade-offs between microservices and monolith architecture")
# Code task -> Ollama (free!)result2 = agent.execute("Write a Python function to calculate fibonacci numbers")Step 4: Set Up Ollama with Qwen Coder
Install and configure the local model:
# Install Ollamacurl -fsSL https://ollama.ai/install.sh | sh
# Pull Qwen Coder (optimized for code generation)ollama pull qwen2.5-coder:7b
# Verify it worksollama run qwen2.5-coder "Write a hello world in Python"The output:
>>> Write a hello world in Python
Here's a simple hello world program in Python:
print("Hello, World!")
This will output: Hello, World!Step 5: Test the Cost Reduction
I ran a week-long comparison:
Before (all Claude API):- Total cost: $52- All tasks processed by Claude
After (hybrid approach):- Claude costs: $12 (only complex reasoning)- Ollama costs: $0 (local, free)- Total cost: $12
Savings: $40/week = 77% reduction!The breakdown:
Task Type | Before | After | Savings-------------------|------------|------------|--------Code generation | $25 | $0 | $25Simple queries | $10 | $0 | $10Complex reasoning | $15 | $12 | $3Other tasks | $2 | $0 | $2-------------------|------------|------------|--------Total | $52 | $12 | $40Why This Works
The blueprint approach taught me three key lessons:
1. Repositories as Architecture Guides, Not Dependencies
When I first tried to use OpenClaw as a dependency, I got conflicts:
ImportError: cannot import name 'Agent' from 'claw'Version conflicts with existing packagesThe solution was to read the source code, understand the patterns, and implement my own version:
WRONG:from openclaw import Agent # Dependency approach
RIGHT:# Study openclaw/agent.py# Understand the tool calling pattern# Implement your own version in my_agent.py2. Task Routing is Critical
Not all tasks need Claude. The blueprints showed me how to analyze task complexity:
Needs Claude (complex):- Architecture decisions- Multi-step reasoning- Ambiguous requirements- Code review and analysis
Works with Local Model (simple):- Code generation from clear specs- Format conversions- Simple queries- Routine documentation3. Token Management Matters
Claude tokens expire. I added refresh logic:
import timefrom datetime import datetime, timedelta
class TokenManager: """Manage Claude token lifecycle."""
def __init__(self, token: str, refresh_interval_hours: int = 8): self.token = token self.last_refresh = datetime.now() self.refresh_interval = timedelta(hours=refresh_interval_hours)
def get_token(self) -> str: """Get valid token, refresh if needed.""" if datetime.now() - self.last_refresh > self.refresh_interval: self._refresh_token() return self.token
def _refresh_token(self): """Refresh token from environment or auth service.""" # Implementation depends on your auth setup self.token = os.environ.get("ANTHROPIC_API_KEY") self.last_refresh = datetime.now()Common Mistakes I Made
Mistake 1: Using Repos as Dependencies
BAD: pip install openclaw # Adds unnecessary complexityGOOD: Read the code, understand patterns, implement yourselfMistake 2: Over-Engineering the Router
My first router was too complex:
# OVERLY COMPLEXdef route_task(prompt): # 50 lines of ML classification # Sentiment analysis # Topic modeling # ... return model_choice
# SIMPLE AND EFFECTIVEdef route_task(prompt): if any(kw in prompt.lower() for kw in COMPLEX_KEYWORDS): return "claude" return "ollama"Mistake 3: Not Testing Local Model Quality
Qwen Coder works well for Python, but struggled with Rust:
Task: "Write a Rust async function"Ollama output: Syntax errors, outdated patternsClaude output: Correct, idiomatic Rust code
Solution: Route Rust tasks to Claude, Python tasks to OllamaAlternative: Using Claude Pro/Max Tokens
Some Reddit users suggested using Claude Pro/Max subscription tokens instead of API keys:
API pricing:- $3 per million input tokens- $15 per million output tokens
Pro/Max subscription:- $20-40/month flat rate- Unlimited usage within limits- Works for personal projectsThis approach requires extracting tokens from the web interface. I haven’t implemented this personally, but it’s another cost optimization path.
Summary
In this post, I explained how to build a custom AI agent that reduces costs by 77% using a hybrid approach:
-
Use blueprints, not dependencies: Study OpenClaw, NemoClaw, and Claude Agent SDK for architectural patterns, then implement your own version.
-
Route tasks intelligently: Send complex reasoning to Claude, code generation to local models like Ollama Qwen.
-
Start simple: A keyword-based router works surprisingly well. Add complexity only when needed.
-
Test local model quality: Qwen Coder excels at Python but may struggle with other languages.
The key insight: you don’t need Claude for everything. Match the model to the task complexity.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Claude Agent SDK GitHub
- 👨💻 OpenClaw GitHub Repository
- 👨💻 Ollama Official Website
- 👨💻 Reddit Discussion: Building Custom AI Agents
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments