How to Build a Self-Hosted AI Assistant with OpenClaw
Problem
I wanted a personal AI assistant that runs on my own hardware. The problem with most solutions:
- They lock you into a single model provider
- They don’t remember context across sessions
- They require multiple apps/interfaces
- They send data to multiple cloud services
What I needed was something that could route requests to different models based on task type, maintain persistent memory, and work through a single interface I already use daily.
Purpose
Build a self-hosted AI assistant using OpenClaw Gateway that:
- Routes requests intelligently (cheap models for simple tasks, powerful models for complex ones)
- Maintains three-tier memory across sessions
- Uses Telegram as the only interface
- Runs reliably as a systemd service
Environment
I set this up on a Linux VM with the following:
- 12+ LLMs accessible via Ollama and API- 9 Docker containers running various services- 23 monitored services- OpenClaw Gateway as the orchestration layer- Telegram Bot API for interfaceSolution
Running OpenClaw as a Systemd User Service
First, I created a systemd user service to ensure OpenClaw starts automatically and stays running:
[Unit]Description=OpenClaw GatewayAfter=network.target
[Service]Type=simpleWorkingDirectory=/home/user/openclawExecStart=/usr/local/bin/openclaw startRestart=alwaysRestartSec=10
[Install]WantedBy=default.targetEnable and start the service:
systemctl --user enable openclawsystemctl --user start openclawsystemctl --user status openclawI initially tried running it directly in a terminal, but the service would die when I closed the session. Using --user flag and loginctl enable-linger ensures it persists across reboots.
Intelligent Model Routing
The key insight: different tasks need different models. Using a single model for everything is wasteful.
Here’s my routing configuration:
routing: default: model: glm-5 provider: ollama reason: "Cost-effective for routine queries"
rules: - match: type: coding complexity: high route: model: claude-sonnet provider: anthropic reason: "Best for intricate coding tasks"
- match: type: reasoning context_length: long route: model: kimi-k2.5 provider: moonshot reason: "Handles long context well"
- match: type: voice task: briefing route: model: minimax-m2.7 provider: minimax reason: "Natural voice synthesis"
- match: type: background priority: low route: model: glm4-flash provider: zhipu reason: "Fast and cheap for lightweight tasks"The routing logic works like this:
- GLM-5 via Ollama cloud relay - Default for most queries. Cheap enough that I don’t think about cost.
- Claude Sonnet - Only for complex coding. The expensive model, but worth it when needed.
- Kimi K2.5 - Long context tasks. When I need to analyze documents or extended conversations.
- MiniMax M2.7 - Voice morning briefings. TTS quality matters here.
- GLM4-Flash - Background tasks. Fast responses, minimal cost.
I tried several variations before settling on this. Initially, I routed everything through GPT-4, but the costs added up quickly. The current setup costs about 1/10th of a single-model approach.
Three-Tier Memory System
Memory was the hardest part to get right. I went through several iterations before landing on a three-tier approach.
Tier 1: Daily Markdown Logs
~/.openclaw/memory/├── 2026-03-01.md├── 2026-03-02.md├── 2026-03-03.md└── ...Every conversation gets logged automatically. The format is simple:
## Session: 2026-03-23 08:15
**User**: What's the weather today?**Assistant**: Currently 18°C, partly cloudy...
## Session: 2026-03-23 14:30
**User**: Remind me about the dentist appointment**Assistant**: I've noted your dentist appointment...This gives me automatic session tracking without any curation effort.
Tier 2: Curated Memory (MEMORY.md)
# Important Context
## Preferences- Respond in English by default- Use metric units- Timezone: Asia/Shanghai
## Ongoing Projects- Home automation dashboard (in progress)- Blog migration to Astro (completed 2026-02-15)
## Important Decisions- 2026-03-15: Switched from Zapier to n8n for automation- 2026-03-20: Decided to use SQLite for local cacheThis file contains decisions, preferences, and context I want the assistant to always know. I update it manually when something important happens.
Tier 3: ChromaDB Vector Database
from chromadb import Clientfrom chromadb.config import Settings
chroma = Client(Settings( chroma_db_impl="duckdb+parquet", persist_directory="~/.openclaw/chroma"))
collection = chroma.get_collection("conversations")
def search_memory(query: str, n_results: int = 5): results = collection.query( query_texts=[query], n_results=n_results ) return resultsChromaDB indexes everything from Tier 1 and Tier 2, enabling semantic search across the entire conversation history. When I ask “What did I decide about the API caching?”, it finds the relevant conversation even if I don’t remember exact words.
The combination works because each tier serves a different purpose:
- Tier 1: Automatic, comprehensive, no effort required
- Tier 2: Curated, high-signal, persistent context
- Tier 3: Searchable, semantic, cross-references everything
Telegram Integration
Telegram became the exclusive interface. No web dashboard needed.
import telegramfrom telegram.ext import Application, MessageHandler, filters
async def handle_message(update, context): user_message = update.message.text user_id = update.effective_user.id
# Route through OpenClaw response = await openclaw.chat( message=user_message, user_id=user_id, include_memory=True )
await update.message.reply_text(response)
app = Application.builder().token(TELEGRAM_TOKEN).build()app.add_handler(MessageHandler(filters.TEXT, handle_message))app.run_polling()I initially built a React dashboard, but found I never used it. Telegram is always on my phone, supports markdown, and handles voice messages. The bot runs as part of the OpenClaw service.
For bilingual support (French/English), I added a simple language detection:
def detect_language(text: str) -> str: french_chars = set('àâäéèêëïîôùûüÿçœæ') if any(c in french_chars for c in text.lower()): return "fr" return "en"
# Then include in contextcontext = { "language": detect_language(user_message), "user_id": user_id}What Didn’t Work
I tried several things that failed:
- Using only ChromaDB - Lost chronological context, retrieval was noisy
- Routing based on keywords - Too brittle, missed intent
- Web dashboard - Never used it, added maintenance burden
- Single premium model - Expensive and slow for simple queries
Summary
After 50 days of running this setup, the key insights are:
- Intelligent routing saves money - GLM-5 handles 80% of queries at 1% of the cost
- Memory tiers matter - Each tier solves a different problem
- Interface simplicity wins - One interface (Telegram) is better than many
- Systemd reliability - Running as a service means it’s always available
The system now handles about 50-100 queries daily, costs under $5/month in API calls, and maintains context across weeks of conversations. The three-tier memory system means I can reference decisions from a month ago, and the routing ensures I’m not burning budget on simple questions.
Single GPT-4 approach: ~$45/monthMulti-model routing: ~$5/monthSavings: 90%Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit: I gave my home a brain. Here's what 50 days of self-hosted AI looks like
- 👨💻 OpenClaw Documentation
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments