Skip to content

How to Build a Self-Hosted AI Assistant with OpenClaw

Problem

I wanted a personal AI assistant that runs on my own hardware. The problem with most solutions:

  • They lock you into a single model provider
  • They don’t remember context across sessions
  • They require multiple apps/interfaces
  • They send data to multiple cloud services

What I needed was something that could route requests to different models based on task type, maintain persistent memory, and work through a single interface I already use daily.

Purpose

Build a self-hosted AI assistant using OpenClaw Gateway that:

  • Routes requests intelligently (cheap models for simple tasks, powerful models for complex ones)
  • Maintains three-tier memory across sessions
  • Uses Telegram as the only interface
  • Runs reliably as a systemd service

Environment

I set this up on a Linux VM with the following:

Environment Overview
- 12+ LLMs accessible via Ollama and API
- 9 Docker containers running various services
- 23 monitored services
- OpenClaw Gateway as the orchestration layer
- Telegram Bot API for interface

Solution

Running OpenClaw as a Systemd User Service

First, I created a systemd user service to ensure OpenClaw starts automatically and stays running:

~/.config/systemd/user/openclaw.service
[Unit]
Description=OpenClaw Gateway
After=network.target
[Service]
Type=simple
WorkingDirectory=/home/user/openclaw
ExecStart=/usr/local/bin/openclaw start
Restart=always
RestartSec=10
[Install]
WantedBy=default.target

Enable and start the service:

enable-service.sh
systemctl --user enable openclaw
systemctl --user start openclaw
systemctl --user status openclaw

I initially tried running it directly in a terminal, but the service would die when I closed the session. Using --user flag and loginctl enable-linger ensures it persists across reboots.

Intelligent Model Routing

The key insight: different tasks need different models. Using a single model for everything is wasteful.

Here’s my routing configuration:

openclaw-config.yaml
routing:
default:
model: glm-5
provider: ollama
reason: "Cost-effective for routine queries"
rules:
- match:
type: coding
complexity: high
route:
model: claude-sonnet
provider: anthropic
reason: "Best for intricate coding tasks"
- match:
type: reasoning
context_length: long
route:
model: kimi-k2.5
provider: moonshot
reason: "Handles long context well"
- match:
type: voice
task: briefing
route:
model: minimax-m2.7
provider: minimax
reason: "Natural voice synthesis"
- match:
type: background
priority: low
route:
model: glm4-flash
provider: zhipu
reason: "Fast and cheap for lightweight tasks"

The routing logic works like this:

  1. GLM-5 via Ollama cloud relay - Default for most queries. Cheap enough that I don’t think about cost.
  2. Claude Sonnet - Only for complex coding. The expensive model, but worth it when needed.
  3. Kimi K2.5 - Long context tasks. When I need to analyze documents or extended conversations.
  4. MiniMax M2.7 - Voice morning briefings. TTS quality matters here.
  5. GLM4-Flash - Background tasks. Fast responses, minimal cost.

I tried several variations before settling on this. Initially, I routed everything through GPT-4, but the costs added up quickly. The current setup costs about 1/10th of a single-model approach.

Three-Tier Memory System

Memory was the hardest part to get right. I went through several iterations before landing on a three-tier approach.

Tier 1: Daily Markdown Logs

Directory Structure
~/.openclaw/memory/
├── 2026-03-01.md
├── 2026-03-02.md
├── 2026-03-03.md
└── ...

Every conversation gets logged automatically. The format is simple:

2026-03-23.md
## Session: 2026-03-23 08:15
**User**: What's the weather today?
**Assistant**: Currently 18°C, partly cloudy...
## Session: 2026-03-23 14:30
**User**: Remind me about the dentist appointment
**Assistant**: I've noted your dentist appointment...

This gives me automatic session tracking without any curation effort.

Tier 2: Curated Memory (MEMORY.md)

MEMORY.md
# Important Context
## Preferences
- Respond in English by default
- Use metric units
- Timezone: Asia/Shanghai
## Ongoing Projects
- Home automation dashboard (in progress)
- Blog migration to Astro (completed 2026-02-15)
## Important Decisions
- 2026-03-15: Switched from Zapier to n8n for automation
- 2026-03-20: Decided to use SQLite for local cache

This file contains decisions, preferences, and context I want the assistant to always know. I update it manually when something important happens.

Tier 3: ChromaDB Vector Database

memory-search.py
from chromadb import Client
from chromadb.config import Settings
chroma = Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="~/.openclaw/chroma"
))
collection = chroma.get_collection("conversations")
def search_memory(query: str, n_results: int = 5):
results = collection.query(
query_texts=[query],
n_results=n_results
)
return results

ChromaDB indexes everything from Tier 1 and Tier 2, enabling semantic search across the entire conversation history. When I ask “What did I decide about the API caching?”, it finds the relevant conversation even if I don’t remember exact words.

The combination works because each tier serves a different purpose:

  • Tier 1: Automatic, comprehensive, no effort required
  • Tier 2: Curated, high-signal, persistent context
  • Tier 3: Searchable, semantic, cross-references everything

Telegram Integration

Telegram became the exclusive interface. No web dashboard needed.

telegram-handler.py
import telegram
from telegram.ext import Application, MessageHandler, filters
async def handle_message(update, context):
user_message = update.message.text
user_id = update.effective_user.id
# Route through OpenClaw
response = await openclaw.chat(
message=user_message,
user_id=user_id,
include_memory=True
)
await update.message.reply_text(response)
app = Application.builder().token(TELEGRAM_TOKEN).build()
app.add_handler(MessageHandler(filters.TEXT, handle_message))
app.run_polling()

I initially built a React dashboard, but found I never used it. Telegram is always on my phone, supports markdown, and handles voice messages. The bot runs as part of the OpenClaw service.

For bilingual support (French/English), I added a simple language detection:

language-detect.py
def detect_language(text: str) -> str:
french_chars = set('àâäéèêëïîôùûüÿçœæ')
if any(c in french_chars for c in text.lower()):
return "fr"
return "en"
# Then include in context
context = {
"language": detect_language(user_message),
"user_id": user_id
}

What Didn’t Work

I tried several things that failed:

  1. Using only ChromaDB - Lost chronological context, retrieval was noisy
  2. Routing based on keywords - Too brittle, missed intent
  3. Web dashboard - Never used it, added maintenance burden
  4. Single premium model - Expensive and slow for simple queries

Summary

After 50 days of running this setup, the key insights are:

  • Intelligent routing saves money - GLM-5 handles 80% of queries at 1% of the cost
  • Memory tiers matter - Each tier solves a different problem
  • Interface simplicity wins - One interface (Telegram) is better than many
  • Systemd reliability - Running as a service means it’s always available

The system now handles about 50-100 queries daily, costs under $5/month in API calls, and maintains context across weeks of conversations. The three-tier memory system means I can reference decisions from a month ago, and the routing ensures I’m not burning budget on simple questions.

Cost Comparison
Single GPT-4 approach: ~$45/month
Multi-model routing: ~$5/month
Savings: 90%

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments