How to Automate Homelab Management with AI Agents

Mar 27, 2026

Purpose

Managing a homelab is time-consuming. Every new service needs DNS entries, proxy configuration, and monitoring. Resources get over-allocated because I don’t know the actual usage patterns.

This post shows how to use AI agents to automate homelab infrastructure management. The key point is letting AI observe your services for a week, then accepting its optimization suggestions.

The Problem

I run a Proxmox homelab with various services: Plex, Tandoor, the *arr suite, and more. Over time, I noticed several problems:

Resource waste - Containers allocated 8GB RAM but never using more than 2GB
Manual configuration - Every new service requires DNS entries in Pi-hole, proxy config in Nginx Proxy Manager
Reactive troubleshooting - Problems discovered only after they cause issues
IOPS overhead - NFS mounts not optimized for my workload

I wanted an AI that could watch my infrastructure, learn patterns, and automate the routine work.

The Breakthrough: Reddit’s Best Use Case

I found a Reddit thread asking about the best use case of OpenClaw. One response caught my attention:

“I have a proxmox homelab that runs various services (plex, tandoor, *.arr suite, etc.). I had it watch service loads across a week then have it suggest scaling options (‘this container is assigned 8G ram, but it never goes past 2GB, give it 3 and keep watching?’). It has also adjusted NFS mount tweaks across my services to lessen IOPS overhead.”

This was exactly what I needed. The user went further:

“It watches existing and new services coming into my homelab… adds DNS entries to pihole, adds a host entry in my NPM. If I give it initial access to the new service, it will hop in, add its own SSH key…”

And the most interesting part:

“Every new service gets added to my local ollama watchdog. It checks the environment every 30 minutes and only reports in if there’s a problem or a significant change.”

I decided to implement this architecture.

Architecture Overview

Before diving into implementation, here’s the overall architecture:

+-----------------------------------------------------------------------------+
|                      AI Agent Orchestrator (OpenClaw)                        |
|                                                                              |
|  +--------------+  +--------------+  +--------------+  +------------------+ |
|  |   Proxmox    |  |    Docker    |  |   Pi-hole    |  | Nginx Proxy     | |
|  |  API Access  |  |   Engine API |  |  DNS API     |  | Manager API     | |
|  +------+-------+  +------+-------+  +------+-------+  +-------+----------+ |
|         |                 |                 |                   |            |
|         +-----------------+-----------------+-------------------+            |
|                                    |                                         |
|                    +---------------v---------------+                          |
|                    |    Observation & Analysis     |                          |
|                    |    (Weekly Load Metrics)     |                          |
|                    +---------------+---------------+                          |
|                                    |                                         |
|         +--------------------------+--------------------------+              |
|         |                          |                          |              |
|  +------v------+  +----------------v----------------+  +-----v-----------+   |
|  |   Scaling   |  |   Configuration Engine         |  |   Watchdog      |   |
|  | Suggestions |  |   (DNS/Proxy/SSH)              |  |   (Ollama)      |   |
|  +-------------+  +--------------------------------+  +-----------------+   |
|                                                                              |
+-----------------------------------------------------------------------------+

The AI agent connects to:

Proxmox API - Monitor VM/container resources
Docker Engine API - Container metrics
Pi-hole API - DNS automation
Nginx Proxy Manager API - Reverse proxy configuration
Ollama - Local LLM for intelligent watchdog

Step 1: Set Up Proxmox API Access

First, I needed to give the AI access to Proxmox.

Create API Token

In Proxmox web UI:

Go to Datacenter > Permissions > API Tokens
Create a new token with appropriate permissions
Copy the token ID and secret

Configure the Connection

proxmox:
  host: "https://pve.example.com:8006"
  api_token: "user@pam!tokenid=uuid-here"
  verify_ssl: false  # For self-signed certs
  node: "pve"

Test the Connection

import requests
import os

PROXMOX_HOST = os.environ.get("PROXMOX_HOST", "https://pve.local:8006")
API_TOKEN = os.environ.get("PROXMOX_API_TOKEN")

headers = {
    "Authorization": f"PVEAPIToken={API_TOKEN}",
    "Content-Type": "application/json"
}

# List all VMs
url = f"{PROXMOX_HOST}/api2/json/nodes/pve/qemu"
resp = requests.get(url, headers=headers, verify=False)
print(resp.json())

When I ran this:

export PROXMOX_HOST="https://192.168.1.10:8006"
export PROXMOX_API_TOKEN="root@pam!automation=xxxx-xxxx-xxxx"
python test_proxmox.py

I got a list of all my VMs with their current status.

Step 2: Resource Monitoring and Optimization

The most valuable feature is resource right-sizing. I built a monitor that tracks usage over a week.

import requests
import os
from datetime import datetime
from collections import defaultdict

class HomelabMonitor:
    def __init__(self):
        self.proxmox_host = os.environ["PROXMOX_HOST"]
        self.proxmox_token = os.environ["PROXMOX_API_TOKEN"]
        self.metrics_history = defaultdict(list)

    def get_headers(self):
        return {
            "Authorization": f"PVEAPIToken={self.proxmox_token}",
            "Content-Type": "application/json"
        }

    def get_vms(self, node="pve"):
        """List all VMs on a node"""
        url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu"
        resp = requests.get(url, headers=self.get_headers(), verify=False)
        return resp.json().get("data", [])

    def get_vm_rrd(self, node, vmid, timeframe="week"):
        """Get resource usage data for a VM"""
        url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu/{vmid}/rrddata"
        params = {"timeframe": timeframe, "cf": "AVERAGE"}
        resp = requests.get(url, headers=self.get_headers(),
                            params=params, verify=False)
        return resp.json().get("data", [])

    def analyze_resource_usage(self, node="pve"):
        """Analyze resource usage and suggest optimizations"""
        vms = self.get_vms(node)
        suggestions = []

        for vm in vms:
            vmid = vm["vmid"]
            name = vm["name"]
            maxmem = vm.get("maxmem", 0) / (1024**3)  # Convert to GB

            # Get historical data
            rrd_data = self.get_vm_rrd(node, vmid, "week")

            if rrd_data:
                # Calculate max memory usage
                mem_usage_samples = [d.get("mem", 0) for d in rrd_data]
                max_mem_used = max(mem_usage_samples)

                # Suggest right-sizing if consistently low usage
                if max_mem_used < 0.25 and maxmem > 2:
                    suggested_mem = max(2, max_mem_used * 1.5)
                    suggestions.append({
                        "vmid": vmid,
                        "name": name,
                        "current_mem_gb": round(maxmem, 1),
                        "max_used_percent": round(max_mem_used * 100, 1),
                        "suggested_mem_gb": round(suggested_mem, 1),
                        "potential_savings_gb": round(maxmem - suggested_mem, 1)
                    })

        return suggestions

    def generate_report(self, suggestions):
        """Generate a human-readable optimization report"""
        if not suggestions:
            return "No optimization opportunities found this week."

        report = [f"## Homelab Resource Optimization Report\n"]
        report.append(f"**Date:** {datetime.now().strftime('%Y-%m-%d')}\n")
        report.append("### Right-Sizing Opportunities\n")

        for s in suggestions:
            report.append(f"\n**{s['name']}** (VMID: {s['vmid']})")
            report.append(f"- Current: {s['current_mem_gb']}GB")
            report.append(f"- Peak usage: {s['max_used_percent']}%")
            report.append(f"- Suggested: {s['suggested_mem_gb']}GB")
            report.append(f"- Savings: {s['potential_savings_gb']}GB")

        total_savings = sum(s['potential_savings_gb'] for s in suggestions)
        report.append(f"\n**Total potential memory savings: {total_savings:.1f}GB**")

        return "\n".join(report)

if __name__ == "__main__":
    monitor = HomelabMonitor()
    suggestions = monitor.analyze_resource_usage()
    print(monitor.generate_report(suggestions))

When I ran this after a week of observation:

## Homelab Resource Optimization Report

**Date:** 2026-03-27

### Right-Sizing Opportunities

**plex** (VMID: 100)
- Current: 8.0GB
- Peak usage: 22.3%
- Suggested: 3.0GB
- Savings: 5.0GB

**sonarr** (VMID: 101)
- Current: 4.0GB
- Peak usage: 18.5%
- Suggested: 2.0GB
- Savings: 2.0GB

**tandoor** (VMID: 102)
- Current: 2.0GB
- Peak usage: 35.2%
- Suggested: 1.5GB
- Savings: 0.5GB

**Total potential memory savings: 7.5GB**

I recovered 7.5GB of RAM by right-sizing just three containers.

Step 3: Pi-hole DNS Automation

Every new service needs a DNS entry. I automated this with the Pi-hole API.

import requests

class PiholeDNS:
    def __init__(self, host, password):
        self.host = host
        self.password = password
        self.session = requests.Session()
        self._authenticate()

    def _authenticate(self):
        """Authenticate with Pi-hole"""
        url = f"{self.host}/api/auth"
        data = {"password": self.password}
        resp = self.session.post(url, json=data)
        self.session.headers.update({
            "X-XSRF-Token": resp.json()["session"]["sid"]
        })

    def add_local_domain(self, domain, ip, comment="Auto-added by homelab-automation"):
        """Add a local DNS entry"""
        url = f"{self.host}/api/domains/dns/local"
        data = {
            "domain": domain,
            "ip": ip,
            "type": "A",
            "comment": comment
        }
        resp = self.session.post(url, json=data)
        return resp.json()

    def list_domains(self):
        """List all local DNS domains"""
        url = f"{self.host}/api/domains/dns/local"
        resp = self.session.get(url)
        return resp.json()

# Usage
pihole = PiholeDNS("http://pihole.local", os.environ["PIHOLE_PASSWORD"])
pihole.add_local_domain("newservice.home.arpa", "192.168.1.100")

Now when I deploy a new container, the AI automatically adds the DNS entry.

Step 4: Nginx Proxy Manager Integration

The next step is automatic reverse proxy configuration.

import requests

class NginxProxyManager:
    def __init__(self, host, email, password):
        self.host = host
        self.email = email
        self.password = password
        self.session = requests.Session()
        self._authenticate()

    def _authenticate(self):
        """Authenticate with NPM"""
        url = f"{self.host}/api/tokens"
        data = {
            "identity": self.email,
            "secret": self.password
        }
        resp = self.session.post(url, json=data)
        self.token = resp.json().get("token")
        self.session.headers.update({
            "Authorization": f"Bearer {self.token}"
        })

    def create_proxy_host(self, domain_names, forward_host, forward_port, ssl=True):
        """Create a new proxy host"""
        url = f"{self.host}/api/nginx/proxy-hosts"
        data = {
            "domain_names": domain_names,
            "forward_host": forward_host,
            "forward_port": forward_port,
            "certificate_id": None,  # Add SSL cert ID if available
            "ssl_forced": ssl,
            "http2_support": True,
            "block_exploits": True,
            "caching_enabled": True
        }
        resp = self.session.post(url, json=data)
        return resp.json()

# Usage
npm = NginxProxyManager(
    "http://npm.local:81",
    os.environ["NPM_EMAIL"],
    os.environ["NPM_PASSWORD"]
)
npm.create_proxy_host(
    domain_names=["newservice.home.arpa"],
    forward_host="192.168.1.100",
    forward_port=8080
)

Step 5: Ollama Watchdog

The most powerful component is the Ollama watchdog. It uses a local LLM to monitor services intelligently.

Install Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a lightweight model
ollama pull llama3.2:3b

# Start Ollama service
ollama serve

Implement the Watchdog

import requests
from datetime import datetime

class OllamaWatchdog:
    def __init__(self, ollama_host="http://localhost:11434", model="llama3.2:3b"):
        self.ollama_host = ollama_host
        self.model = model
        self.services = self.load_services()

    def load_services(self):
        """Load list of services to monitor"""
        return [
            {"name": "plex", "url": "http://plex:32400", "check": "health"},
            {"name": "pihole", "url": "http://pihole/admin", "check": "health"},
            {"name": "npm", "url": "http://npm:81", "check": "health"},
            {"name": "tandoor", "url": "http://tandoor:8080", "check": "health"},
        ]

    def check_service_health(self, service):
        """Check if a service is responding"""
        try:
            resp = requests.get(service["url"], timeout=10)
            return resp.status_code == 200
        except:
            return False

    def analyze_with_ollama(self, prompt):
        """Use Ollama for intelligent analysis"""
        url = f"{self.ollama_host}/api/generate"
        data = {
            "model": self.model,
            "prompt": prompt,
            "stream": False
        }
        resp = requests.post(url, json=data, timeout=60)
        return resp.json().get("response", "")

    def run_checks(self):
        """Run health checks on all services"""
        results = []
        problems = []

        for service in self.services:
            healthy = self.check_service_health(service)
            results.append({
                "service": service["name"],
                "healthy": healthy,
                "timestamp": datetime.now().isoformat()
            })

            if not healthy:
                problems.append(service["name"])

        # Only report if there are problems
        if problems:
            analysis_prompt = f"""
            The following services are not responding: {', '.join(problems)}

            Based on typical homelab configurations, what are the most likely causes
            and recommended troubleshooting steps? Keep response under 200 words.
            """

            analysis = self.analyze_with_ollama(analysis_prompt)
            self.send_alert(problems, analysis)

        return results

    def send_alert(self, problems, analysis):
        """Send alert notification"""
        message = f"""
        ## Homelab Watchdog Alert

        **Time:** {datetime.now().strftime('%Y-%m-%d %H:%M')}
        **Problems:** {', '.join(problems)}

        ### AI Analysis
        {analysis}
        """
        print(message)  # Replace with actual notification (Telegram, email, etc.)

if __name__ == "__main__":
    watchdog = OllamaWatchdog()
    watchdog.run_checks()

I set this to run every 30 minutes via cron:

*/30 * * * * /usr/bin/python3 /opt/homelab/ollama_watchdog.py >> /var/log/watchdog.log 2>&1

Step 6: Automation Rules Configuration

I defined clear rules for what the AI can do automatically versus what needs approval:

automation_rules:
  resource_optimization:
    observation_period_days: 7
    memory_threshold_percent: 25  # Alert if using <25% of allocated
    cpu_threshold_percent: 20
    action: "suggest"  # Start with suggestions, not auto-apply

  new_service_detection:
    scan_interval_minutes: 5
    auto_add_dns: true
    auto_add_proxy: true
    auto_add_ssh_key: false  # Requires manual approval

  watchdog:
    check_interval_minutes: 30
    alert_on:
      - service_down
      - memory_spike_percent: 50
      - disk_space_percent: 90

Common Mistakes I Made

Giving AI too much autonomy initially - I started with action: "suggest" mode before enabling auto-apply. This prevented accidental resource changes.
Ignoring security boundaries - I created separate API tokens with limited scope for each service. Never use root tokens.
No rollback plan - I now keep backups of configurations before any AI modifications.
Over-alerting - I configured the watchdog to only report significant changes, not every minor fluctuation. Otherwise, I’d be flooded with notifications.
Skipping observation period - I learned to let the AI observe for at least a week before making optimization suggestions. One day of data isn’t enough.

Results After One Month

After running this setup for a month:

Metric	Before	After	Improvement
Memory allocated	48GB	36GB	12GB recovered
Time per new service	30 min	5 min	25 min saved
Problem detection	Reactive	Proactive	Issues caught early
NFS IOPS overhead	High	Optimized	Better performance

The biggest win is the 12GB of memory recovered from right-sizing containers. That’s enough to run two additional medium-sized services.

Summary

In this post, I showed how to automate homelab management with AI agents. The key components are:

Proxmox API access for monitoring VM/container resources
Pi-hole API for automatic DNS entries
Nginx Proxy Manager API for reverse proxy configuration
Ollama watchdog for intelligent health monitoring

The key point is starting with observation mode, verifying suggestions, then enabling automation. Let the AI watch your infrastructure for a week before accepting its optimization recommendations.

My homelab now manages itself. The AI handles routine configuration, suggests resource optimizations, and only alerts me when something actually needs attention.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!