How to Automate Homelab Management with AI Agents
Purpose
Managing a homelab is time-consuming. Every new service needs DNS entries, proxy configuration, and monitoring. Resources get over-allocated because I don’t know the actual usage patterns.
This post shows how to use AI agents to automate homelab infrastructure management. The key point is letting AI observe your services for a week, then accepting its optimization suggestions.
The Problem
I run a Proxmox homelab with various services: Plex, Tandoor, the *arr suite, and more. Over time, I noticed several problems:
- Resource waste - Containers allocated 8GB RAM but never using more than 2GB
- Manual configuration - Every new service requires DNS entries in Pi-hole, proxy config in Nginx Proxy Manager
- Reactive troubleshooting - Problems discovered only after they cause issues
- IOPS overhead - NFS mounts not optimized for my workload
I wanted an AI that could watch my infrastructure, learn patterns, and automate the routine work.
The Breakthrough: Reddit’s Best Use Case
I found a Reddit thread asking about the best use case of OpenClaw. One response caught my attention:
“I have a proxmox homelab that runs various services (plex, tandoor, *.arr suite, etc.). I had it watch service loads across a week then have it suggest scaling options (‘this container is assigned 8G ram, but it never goes past 2GB, give it 3 and keep watching?’). It has also adjusted NFS mount tweaks across my services to lessen IOPS overhead.”
This was exactly what I needed. The user went further:
“It watches existing and new services coming into my homelab… adds DNS entries to pihole, adds a host entry in my NPM. If I give it initial access to the new service, it will hop in, add its own SSH key…”
And the most interesting part:
“Every new service gets added to my local ollama watchdog. It checks the environment every 30 minutes and only reports in if there’s a problem or a significant change.”
I decided to implement this architecture.
Architecture Overview
Before diving into implementation, here’s the overall architecture:
+-----------------------------------------------------------------------------+| AI Agent Orchestrator (OpenClaw) || || +--------------+ +--------------+ +--------------+ +------------------+ || | Proxmox | | Docker | | Pi-hole | | Nginx Proxy | || | API Access | | Engine API | | DNS API | | Manager API | || +------+-------+ +------+-------+ +------+-------+ +-------+----------+ || | | | | || +-----------------+-----------------+-------------------+ || | || +---------------v---------------+ || | Observation & Analysis | || | (Weekly Load Metrics) | || +---------------+---------------+ || | || +--------------------------+--------------------------+ || | | | || +------v------+ +----------------v----------------+ +-----v-----------+ || | Scaling | | Configuration Engine | | Watchdog | || | Suggestions | | (DNS/Proxy/SSH) | | (Ollama) | || +-------------+ +--------------------------------+ +-----------------+ || |+-----------------------------------------------------------------------------+The AI agent connects to:
- Proxmox API - Monitor VM/container resources
- Docker Engine API - Container metrics
- Pi-hole API - DNS automation
- Nginx Proxy Manager API - Reverse proxy configuration
- Ollama - Local LLM for intelligent watchdog
Step 1: Set Up Proxmox API Access
First, I needed to give the AI access to Proxmox.
Create API Token
In Proxmox web UI:
- Go to Datacenter > Permissions > API Tokens
- Create a new token with appropriate permissions
- Copy the token ID and secret
Configure the Connection
proxmox: host: "https://pve.example.com:8006" api_token: "user@pam!tokenid=uuid-here" verify_ssl: false # For self-signed certs node: "pve"Test the Connection
import requestsimport os
PROXMOX_HOST = os.environ.get("PROXMOX_HOST", "https://pve.local:8006")API_TOKEN = os.environ.get("PROXMOX_API_TOKEN")
headers = { "Authorization": f"PVEAPIToken={API_TOKEN}", "Content-Type": "application/json"}
# List all VMsurl = f"{PROXMOX_HOST}/api2/json/nodes/pve/qemu"resp = requests.get(url, headers=headers, verify=False)print(resp.json())When I ran this:
export PROXMOX_HOST="https://192.168.1.10:8006"export PROXMOX_API_TOKEN="root@pam!automation=xxxx-xxxx-xxxx"python test_proxmox.pyI got a list of all my VMs with their current status.
Step 2: Resource Monitoring and Optimization
The most valuable feature is resource right-sizing. I built a monitor that tracks usage over a week.
import requestsimport osfrom datetime import datetimefrom collections import defaultdict
class HomelabMonitor: def __init__(self): self.proxmox_host = os.environ["PROXMOX_HOST"] self.proxmox_token = os.environ["PROXMOX_API_TOKEN"] self.metrics_history = defaultdict(list)
def get_headers(self): return { "Authorization": f"PVEAPIToken={self.proxmox_token}", "Content-Type": "application/json" }
def get_vms(self, node="pve"): """List all VMs on a node""" url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu" resp = requests.get(url, headers=self.get_headers(), verify=False) return resp.json().get("data", [])
def get_vm_rrd(self, node, vmid, timeframe="week"): """Get resource usage data for a VM""" url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu/{vmid}/rrddata" params = {"timeframe": timeframe, "cf": "AVERAGE"} resp = requests.get(url, headers=self.get_headers(), params=params, verify=False) return resp.json().get("data", [])
def analyze_resource_usage(self, node="pve"): """Analyze resource usage and suggest optimizations""" vms = self.get_vms(node) suggestions = []
for vm in vms: vmid = vm["vmid"] name = vm["name"] maxmem = vm.get("maxmem", 0) / (1024**3) # Convert to GB
# Get historical data rrd_data = self.get_vm_rrd(node, vmid, "week")
if rrd_data: # Calculate max memory usage mem_usage_samples = [d.get("mem", 0) for d in rrd_data] max_mem_used = max(mem_usage_samples)
# Suggest right-sizing if consistently low usage if max_mem_used < 0.25 and maxmem > 2: suggested_mem = max(2, max_mem_used * 1.5) suggestions.append({ "vmid": vmid, "name": name, "current_mem_gb": round(maxmem, 1), "max_used_percent": round(max_mem_used * 100, 1), "suggested_mem_gb": round(suggested_mem, 1), "potential_savings_gb": round(maxmem - suggested_mem, 1) })
return suggestions
def generate_report(self, suggestions): """Generate a human-readable optimization report""" if not suggestions: return "No optimization opportunities found this week."
report = [f"## Homelab Resource Optimization Report\n"] report.append(f"**Date:** {datetime.now().strftime('%Y-%m-%d')}\n") report.append("### Right-Sizing Opportunities\n")
for s in suggestions: report.append(f"\n**{s['name']}** (VMID: {s['vmid']})") report.append(f"- Current: {s['current_mem_gb']}GB") report.append(f"- Peak usage: {s['max_used_percent']}%") report.append(f"- Suggested: {s['suggested_mem_gb']}GB") report.append(f"- Savings: {s['potential_savings_gb']}GB")
total_savings = sum(s['potential_savings_gb'] for s in suggestions) report.append(f"\n**Total potential memory savings: {total_savings:.1f}GB**")
return "\n".join(report)
if __name__ == "__main__": monitor = HomelabMonitor() suggestions = monitor.analyze_resource_usage() print(monitor.generate_report(suggestions))When I ran this after a week of observation:
## Homelab Resource Optimization Report
**Date:** 2026-03-27
### Right-Sizing Opportunities
**plex** (VMID: 100)- Current: 8.0GB- Peak usage: 22.3%- Suggested: 3.0GB- Savings: 5.0GB
**sonarr** (VMID: 101)- Current: 4.0GB- Peak usage: 18.5%- Suggested: 2.0GB- Savings: 2.0GB
**tandoor** (VMID: 102)- Current: 2.0GB- Peak usage: 35.2%- Suggested: 1.5GB- Savings: 0.5GB
**Total potential memory savings: 7.5GB**I recovered 7.5GB of RAM by right-sizing just three containers.
Step 3: Pi-hole DNS Automation
Every new service needs a DNS entry. I automated this with the Pi-hole API.
import requests
class PiholeDNS: def __init__(self, host, password): self.host = host self.password = password self.session = requests.Session() self._authenticate()
def _authenticate(self): """Authenticate with Pi-hole""" url = f"{self.host}/api/auth" data = {"password": self.password} resp = self.session.post(url, json=data) self.session.headers.update({ "X-XSRF-Token": resp.json()["session"]["sid"] })
def add_local_domain(self, domain, ip, comment="Auto-added by homelab-automation"): """Add a local DNS entry""" url = f"{self.host}/api/domains/dns/local" data = { "domain": domain, "ip": ip, "type": "A", "comment": comment } resp = self.session.post(url, json=data) return resp.json()
def list_domains(self): """List all local DNS domains""" url = f"{self.host}/api/domains/dns/local" resp = self.session.get(url) return resp.json()
# Usagepihole = PiholeDNS("http://pihole.local", os.environ["PIHOLE_PASSWORD"])pihole.add_local_domain("newservice.home.arpa", "192.168.1.100")Now when I deploy a new container, the AI automatically adds the DNS entry.
Step 4: Nginx Proxy Manager Integration
The next step is automatic reverse proxy configuration.
import requests
class NginxProxyManager: def __init__(self, host, email, password): self.host = host self.email = email self.password = password self.session = requests.Session() self._authenticate()
def _authenticate(self): """Authenticate with NPM""" url = f"{self.host}/api/tokens" data = { "identity": self.email, "secret": self.password } resp = self.session.post(url, json=data) self.token = resp.json().get("token") self.session.headers.update({ "Authorization": f"Bearer {self.token}" })
def create_proxy_host(self, domain_names, forward_host, forward_port, ssl=True): """Create a new proxy host""" url = f"{self.host}/api/nginx/proxy-hosts" data = { "domain_names": domain_names, "forward_host": forward_host, "forward_port": forward_port, "certificate_id": None, # Add SSL cert ID if available "ssl_forced": ssl, "http2_support": True, "block_exploits": True, "caching_enabled": True } resp = self.session.post(url, json=data) return resp.json()
# Usagenpm = NginxProxyManager( "http://npm.local:81", os.environ["NPM_EMAIL"], os.environ["NPM_PASSWORD"])npm.create_proxy_host( domain_names=["newservice.home.arpa"], forward_host="192.168.1.100", forward_port=8080)Step 5: Ollama Watchdog
The most powerful component is the Ollama watchdog. It uses a local LLM to monitor services intelligently.
Install Ollama
# Install Ollamacurl -fsSL https://ollama.com/install.sh | sh
# Pull a lightweight modelollama pull llama3.2:3b
# Start Ollama serviceollama serveImplement the Watchdog
import requestsfrom datetime import datetime
class OllamaWatchdog: def __init__(self, ollama_host="http://localhost:11434", model="llama3.2:3b"): self.ollama_host = ollama_host self.model = model self.services = self.load_services()
def load_services(self): """Load list of services to monitor""" return [ {"name": "plex", "url": "http://plex:32400", "check": "health"}, {"name": "pihole", "url": "http://pihole/admin", "check": "health"}, {"name": "npm", "url": "http://npm:81", "check": "health"}, {"name": "tandoor", "url": "http://tandoor:8080", "check": "health"}, ]
def check_service_health(self, service): """Check if a service is responding""" try: resp = requests.get(service["url"], timeout=10) return resp.status_code == 200 except: return False
def analyze_with_ollama(self, prompt): """Use Ollama for intelligent analysis""" url = f"{self.ollama_host}/api/generate" data = { "model": self.model, "prompt": prompt, "stream": False } resp = requests.post(url, json=data, timeout=60) return resp.json().get("response", "")
def run_checks(self): """Run health checks on all services""" results = [] problems = []
for service in self.services: healthy = self.check_service_health(service) results.append({ "service": service["name"], "healthy": healthy, "timestamp": datetime.now().isoformat() })
if not healthy: problems.append(service["name"])
# Only report if there are problems if problems: analysis_prompt = f""" The following services are not responding: {', '.join(problems)}
Based on typical homelab configurations, what are the most likely causes and recommended troubleshooting steps? Keep response under 200 words. """
analysis = self.analyze_with_ollama(analysis_prompt) self.send_alert(problems, analysis)
return results
def send_alert(self, problems, analysis): """Send alert notification""" message = f""" ## Homelab Watchdog Alert
**Time:** {datetime.now().strftime('%Y-%m-%d %H:%M')} **Problems:** {', '.join(problems)}
### AI Analysis {analysis} """ print(message) # Replace with actual notification (Telegram, email, etc.)
if __name__ == "__main__": watchdog = OllamaWatchdog() watchdog.run_checks()I set this to run every 30 minutes via cron:
*/30 * * * * /usr/bin/python3 /opt/homelab/ollama_watchdog.py >> /var/log/watchdog.log 2>&1Step 6: Automation Rules Configuration
I defined clear rules for what the AI can do automatically versus what needs approval:
automation_rules: resource_optimization: observation_period_days: 7 memory_threshold_percent: 25 # Alert if using <25% of allocated cpu_threshold_percent: 20 action: "suggest" # Start with suggestions, not auto-apply
new_service_detection: scan_interval_minutes: 5 auto_add_dns: true auto_add_proxy: true auto_add_ssh_key: false # Requires manual approval
watchdog: check_interval_minutes: 30 alert_on: - service_down - memory_spike_percent: 50 - disk_space_percent: 90Common Mistakes I Made
-
Giving AI too much autonomy initially - I started with
action: "suggest"mode before enabling auto-apply. This prevented accidental resource changes. -
Ignoring security boundaries - I created separate API tokens with limited scope for each service. Never use root tokens.
-
No rollback plan - I now keep backups of configurations before any AI modifications.
-
Over-alerting - I configured the watchdog to only report significant changes, not every minor fluctuation. Otherwise, I’d be flooded with notifications.
-
Skipping observation period - I learned to let the AI observe for at least a week before making optimization suggestions. One day of data isn’t enough.
Results After One Month
After running this setup for a month:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Memory allocated | 48GB | 36GB | 12GB recovered |
| Time per new service | 30 min | 5 min | 25 min saved |
| Problem detection | Reactive | Proactive | Issues caught early |
| NFS IOPS overhead | High | Optimized | Better performance |
The biggest win is the 12GB of memory recovered from right-sizing containers. That’s enough to run two additional medium-sized services.
Summary
In this post, I showed how to automate homelab management with AI agents. The key components are:
- Proxmox API access for monitoring VM/container resources
- Pi-hole API for automatic DNS entries
- Nginx Proxy Manager API for reverse proxy configuration
- Ollama watchdog for intelligent health monitoring
The key point is starting with observation mode, verifying suggestions, then enabling automation. Let the AI watch your infrastructure for a week before accepting its optimization recommendations.
My homelab now manages itself. The AI handles routine configuration, suggests resource optimizations, and only alerts me when something actually needs attention.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Proxmox VE API Documentation
- 👨💻 Pi-hole Documentation
- 👨💻 Nginx Proxy Manager
- 👨💻 Ollama Models Library
- 👨💻 Reddit: Best Use Case of OpenClaw
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments