How to Build a Self-Hosted AI Assistant with OpenClaw

Mar 23, 2026

Problem

I wanted a personal AI assistant that runs on my own hardware. The problem with most solutions:

They lock you into a single model provider
They don’t remember context across sessions
They require multiple apps/interfaces
They send data to multiple cloud services

What I needed was something that could route requests to different models based on task type, maintain persistent memory, and work through a single interface I already use daily.

Purpose

Build a self-hosted AI assistant using OpenClaw Gateway that:

Routes requests intelligently (cheap models for simple tasks, powerful models for complex ones)
Maintains three-tier memory across sessions
Uses Telegram as the only interface
Runs reliably as a systemd service

Environment

I set this up on a Linux VM with the following:

- 12+ LLMs accessible via Ollama and API
- 9 Docker containers running various services
- 23 monitored services
- OpenClaw Gateway as the orchestration layer
- Telegram Bot API for interface

Solution

Running OpenClaw as a Systemd User Service

First, I created a systemd user service to ensure OpenClaw starts automatically and stays running:

[Unit]
Description=OpenClaw Gateway
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/user/openclaw
ExecStart=/usr/local/bin/openclaw start
Restart=always
RestartSec=10

[Install]
WantedBy=default.target

Enable and start the service:

systemctl --user enable openclaw
systemctl --user start openclaw
systemctl --user status openclaw

I initially tried running it directly in a terminal, but the service would die when I closed the session. Using --user flag and loginctl enable-linger ensures it persists across reboots.

Intelligent Model Routing

The key insight: different tasks need different models. Using a single model for everything is wasteful.

Here’s my routing configuration:

routing:
  default:
    model: glm-5
    provider: ollama
    reason: "Cost-effective for routine queries"

  rules:
    - match:
        type: coding
        complexity: high
      route:
        model: claude-sonnet
        provider: anthropic
        reason: "Best for intricate coding tasks"

    - match:
        type: reasoning
        context_length: long
      route:
        model: kimi-k2.5
        provider: moonshot
        reason: "Handles long context well"

    - match:
        type: voice
        task: briefing
      route:
        model: minimax-m2.7
        provider: minimax
        reason: "Natural voice synthesis"

    - match:
        type: background
        priority: low
      route:
        model: glm4-flash
        provider: zhipu
        reason: "Fast and cheap for lightweight tasks"

The routing logic works like this:

GLM-5 via Ollama cloud relay - Default for most queries. Cheap enough that I don’t think about cost.
Claude Sonnet - Only for complex coding. The expensive model, but worth it when needed.
Kimi K2.5 - Long context tasks. When I need to analyze documents or extended conversations.
MiniMax M2.7 - Voice morning briefings. TTS quality matters here.
GLM4-Flash - Background tasks. Fast responses, minimal cost.

I tried several variations before settling on this. Initially, I routed everything through GPT-4, but the costs added up quickly. The current setup costs about 1/10th of a single-model approach.

Three-Tier Memory System

Memory was the hardest part to get right. I went through several iterations before landing on a three-tier approach.

Tier 1: Daily Markdown Logs

~/.openclaw/memory/
├── 2026-03-01.md
├── 2026-03-02.md
├── 2026-03-03.md
└── ...

Every conversation gets logged automatically. The format is simple:

## Session: 2026-03-23 08:15

**User**: What's the weather today?
**Assistant**: Currently 18°C, partly cloudy...

## Session: 2026-03-23 14:30

**User**: Remind me about the dentist appointment
**Assistant**: I've noted your dentist appointment...

This gives me automatic session tracking without any curation effort.

Tier 2: Curated Memory (MEMORY.md)

# Important Context

## Preferences
- Respond in English by default
- Use metric units
- Timezone: Asia/Shanghai

## Ongoing Projects
- Home automation dashboard (in progress)
- Blog migration to Astro (completed 2026-02-15)

## Important Decisions
- 2026-03-15: Switched from Zapier to n8n for automation
- 2026-03-20: Decided to use SQLite for local cache

This file contains decisions, preferences, and context I want the assistant to always know. I update it manually when something important happens.

Tier 3: ChromaDB Vector Database

from chromadb import Client
from chromadb.config import Settings

chroma = Client(Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="~/.openclaw/chroma"
))

collection = chroma.get_collection("conversations")

def search_memory(query: str, n_results: int = 5):
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results

ChromaDB indexes everything from Tier 1 and Tier 2, enabling semantic search across the entire conversation history. When I ask “What did I decide about the API caching?”, it finds the relevant conversation even if I don’t remember exact words.

The combination works because each tier serves a different purpose:

Tier 1: Automatic, comprehensive, no effort required
Tier 2: Curated, high-signal, persistent context
Tier 3: Searchable, semantic, cross-references everything

Telegram Integration

Telegram became the exclusive interface. No web dashboard needed.

import telegram
from telegram.ext import Application, MessageHandler, filters

async def handle_message(update, context):
    user_message = update.message.text
    user_id = update.effective_user.id

    # Route through OpenClaw
    response = await openclaw.chat(
        message=user_message,
        user_id=user_id,
        include_memory=True
    )

    await update.message.reply_text(response)

app = Application.builder().token(TELEGRAM_TOKEN).build()
app.add_handler(MessageHandler(filters.TEXT, handle_message))
app.run_polling()

I initially built a React dashboard, but found I never used it. Telegram is always on my phone, supports markdown, and handles voice messages. The bot runs as part of the OpenClaw service.

For bilingual support (French/English), I added a simple language detection:

def detect_language(text: str) -> str:
    french_chars = set('àâäéèêëïîôùûüÿçœæ')
    if any(c in french_chars for c in text.lower()):
        return "fr"
    return "en"

# Then include in context
context = {
    "language": detect_language(user_message),
    "user_id": user_id
}

What Didn’t Work

I tried several things that failed:

Using only ChromaDB - Lost chronological context, retrieval was noisy
Routing based on keywords - Too brittle, missed intent
Web dashboard - Never used it, added maintenance burden
Single premium model - Expensive and slow for simple queries

Summary

After 50 days of running this setup, the key insights are:

Intelligent routing saves money - GLM-5 handles 80% of queries at 1% of the cost
Memory tiers matter - Each tier solves a different problem
Interface simplicity wins - One interface (Telegram) is better than many
Systemd reliability - Running as a service means it’s always available

The system now handles about 50-100 queries daily, costs under $5/month in API calls, and maintains context across weeks of conversations. The three-tier memory system means I can reference decisions from a month ago, and the routing ensures I’m not burning budget on simple questions.

Single GPT-4 approach: ~$45/month
Multi-model routing: ~$5/month
Savings: 90%

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit: I gave my home a brain. Here's what 50 days of self-hosted AI looks like
👨‍💻 OpenClaw Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!