Skip to content

How to Automate Homelab Management with AI Agents

Purpose

Managing a homelab is time-consuming. Every new service needs DNS entries, proxy configuration, and monitoring. Resources get over-allocated because I don’t know the actual usage patterns.

This post shows how to use AI agents to automate homelab infrastructure management. The key point is letting AI observe your services for a week, then accepting its optimization suggestions.

The Problem

I run a Proxmox homelab with various services: Plex, Tandoor, the *arr suite, and more. Over time, I noticed several problems:

  1. Resource waste - Containers allocated 8GB RAM but never using more than 2GB
  2. Manual configuration - Every new service requires DNS entries in Pi-hole, proxy config in Nginx Proxy Manager
  3. Reactive troubleshooting - Problems discovered only after they cause issues
  4. IOPS overhead - NFS mounts not optimized for my workload

I wanted an AI that could watch my infrastructure, learn patterns, and automate the routine work.

The Breakthrough: Reddit’s Best Use Case

I found a Reddit thread asking about the best use case of OpenClaw. One response caught my attention:

“I have a proxmox homelab that runs various services (plex, tandoor, *.arr suite, etc.). I had it watch service loads across a week then have it suggest scaling options (‘this container is assigned 8G ram, but it never goes past 2GB, give it 3 and keep watching?’). It has also adjusted NFS mount tweaks across my services to lessen IOPS overhead.”

This was exactly what I needed. The user went further:

“It watches existing and new services coming into my homelab… adds DNS entries to pihole, adds a host entry in my NPM. If I give it initial access to the new service, it will hop in, add its own SSH key…”

And the most interesting part:

“Every new service gets added to my local ollama watchdog. It checks the environment every 30 minutes and only reports in if there’s a problem or a significant change.”

I decided to implement this architecture.

Architecture Overview

Before diving into implementation, here’s the overall architecture:

Architecture Diagram
+-----------------------------------------------------------------------------+
| AI Agent Orchestrator (OpenClaw) |
| |
| +--------------+ +--------------+ +--------------+ +------------------+ |
| | Proxmox | | Docker | | Pi-hole | | Nginx Proxy | |
| | API Access | | Engine API | | DNS API | | Manager API | |
| +------+-------+ +------+-------+ +------+-------+ +-------+----------+ |
| | | | | |
| +-----------------+-----------------+-------------------+ |
| | |
| +---------------v---------------+ |
| | Observation & Analysis | |
| | (Weekly Load Metrics) | |
| +---------------+---------------+ |
| | |
| +--------------------------+--------------------------+ |
| | | | |
| +------v------+ +----------------v----------------+ +-----v-----------+ |
| | Scaling | | Configuration Engine | | Watchdog | |
| | Suggestions | | (DNS/Proxy/SSH) | | (Ollama) | |
| +-------------+ +--------------------------------+ +-----------------+ |
| |
+-----------------------------------------------------------------------------+

The AI agent connects to:

  • Proxmox API - Monitor VM/container resources
  • Docker Engine API - Container metrics
  • Pi-hole API - DNS automation
  • Nginx Proxy Manager API - Reverse proxy configuration
  • Ollama - Local LLM for intelligent watchdog

Step 1: Set Up Proxmox API Access

First, I needed to give the AI access to Proxmox.

Create API Token

In Proxmox web UI:

  1. Go to Datacenter > Permissions > API Tokens
  2. Create a new token with appropriate permissions
  3. Copy the token ID and secret

Configure the Connection

proxmox-config.yaml
proxmox:
host: "https://pve.example.com:8006"
api_token: "user@pam!tokenid=uuid-here"
verify_ssl: false # For self-signed certs
node: "pve"

Test the Connection

test_proxmox.py
import requests
import os
PROXMOX_HOST = os.environ.get("PROXMOX_HOST", "https://pve.local:8006")
API_TOKEN = os.environ.get("PROXMOX_API_TOKEN")
headers = {
"Authorization": f"PVEAPIToken={API_TOKEN}",
"Content-Type": "application/json"
}
# List all VMs
url = f"{PROXMOX_HOST}/api2/json/nodes/pve/qemu"
resp = requests.get(url, headers=headers, verify=False)
print(resp.json())

When I ran this:

Terminal
export PROXMOX_HOST="https://192.168.1.10:8006"
export PROXMOX_API_TOKEN="root@pam!automation=xxxx-xxxx-xxxx"
python test_proxmox.py

I got a list of all my VMs with their current status.

Step 2: Resource Monitoring and Optimization

The most valuable feature is resource right-sizing. I built a monitor that tracks usage over a week.

resource_monitor.py
import requests
import os
from datetime import datetime
from collections import defaultdict
class HomelabMonitor:
def __init__(self):
self.proxmox_host = os.environ["PROXMOX_HOST"]
self.proxmox_token = os.environ["PROXMOX_API_TOKEN"]
self.metrics_history = defaultdict(list)
def get_headers(self):
return {
"Authorization": f"PVEAPIToken={self.proxmox_token}",
"Content-Type": "application/json"
}
def get_vms(self, node="pve"):
"""List all VMs on a node"""
url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu"
resp = requests.get(url, headers=self.get_headers(), verify=False)
return resp.json().get("data", [])
def get_vm_rrd(self, node, vmid, timeframe="week"):
"""Get resource usage data for a VM"""
url = f"{self.proxmox_host}:8006/api2/json/nodes/{node}/qemu/{vmid}/rrddata"
params = {"timeframe": timeframe, "cf": "AVERAGE"}
resp = requests.get(url, headers=self.get_headers(),
params=params, verify=False)
return resp.json().get("data", [])
def analyze_resource_usage(self, node="pve"):
"""Analyze resource usage and suggest optimizations"""
vms = self.get_vms(node)
suggestions = []
for vm in vms:
vmid = vm["vmid"]
name = vm["name"]
maxmem = vm.get("maxmem", 0) / (1024**3) # Convert to GB
# Get historical data
rrd_data = self.get_vm_rrd(node, vmid, "week")
if rrd_data:
# Calculate max memory usage
mem_usage_samples = [d.get("mem", 0) for d in rrd_data]
max_mem_used = max(mem_usage_samples)
# Suggest right-sizing if consistently low usage
if max_mem_used < 0.25 and maxmem > 2:
suggested_mem = max(2, max_mem_used * 1.5)
suggestions.append({
"vmid": vmid,
"name": name,
"current_mem_gb": round(maxmem, 1),
"max_used_percent": round(max_mem_used * 100, 1),
"suggested_mem_gb": round(suggested_mem, 1),
"potential_savings_gb": round(maxmem - suggested_mem, 1)
})
return suggestions
def generate_report(self, suggestions):
"""Generate a human-readable optimization report"""
if not suggestions:
return "No optimization opportunities found this week."
report = [f"## Homelab Resource Optimization Report\n"]
report.append(f"**Date:** {datetime.now().strftime('%Y-%m-%d')}\n")
report.append("### Right-Sizing Opportunities\n")
for s in suggestions:
report.append(f"\n**{s['name']}** (VMID: {s['vmid']})")
report.append(f"- Current: {s['current_mem_gb']}GB")
report.append(f"- Peak usage: {s['max_used_percent']}%")
report.append(f"- Suggested: {s['suggested_mem_gb']}GB")
report.append(f"- Savings: {s['potential_savings_gb']}GB")
total_savings = sum(s['potential_savings_gb'] for s in suggestions)
report.append(f"\n**Total potential memory savings: {total_savings:.1f}GB**")
return "\n".join(report)
if __name__ == "__main__":
monitor = HomelabMonitor()
suggestions = monitor.analyze_resource_usage()
print(monitor.generate_report(suggestions))

When I ran this after a week of observation:

Sample Output
## Homelab Resource Optimization Report
**Date:** 2026-03-27
### Right-Sizing Opportunities
**plex** (VMID: 100)
- Current: 8.0GB
- Peak usage: 22.3%
- Suggested: 3.0GB
- Savings: 5.0GB
**sonarr** (VMID: 101)
- Current: 4.0GB
- Peak usage: 18.5%
- Suggested: 2.0GB
- Savings: 2.0GB
**tandoor** (VMID: 102)
- Current: 2.0GB
- Peak usage: 35.2%
- Suggested: 1.5GB
- Savings: 0.5GB
**Total potential memory savings: 7.5GB**

I recovered 7.5GB of RAM by right-sizing just three containers.

Step 3: Pi-hole DNS Automation

Every new service needs a DNS entry. I automated this with the Pi-hole API.

pihole_dns.py
import requests
class PiholeDNS:
def __init__(self, host, password):
self.host = host
self.password = password
self.session = requests.Session()
self._authenticate()
def _authenticate(self):
"""Authenticate with Pi-hole"""
url = f"{self.host}/api/auth"
data = {"password": self.password}
resp = self.session.post(url, json=data)
self.session.headers.update({
"X-XSRF-Token": resp.json()["session"]["sid"]
})
def add_local_domain(self, domain, ip, comment="Auto-added by homelab-automation"):
"""Add a local DNS entry"""
url = f"{self.host}/api/domains/dns/local"
data = {
"domain": domain,
"ip": ip,
"type": "A",
"comment": comment
}
resp = self.session.post(url, json=data)
return resp.json()
def list_domains(self):
"""List all local DNS domains"""
url = f"{self.host}/api/domains/dns/local"
resp = self.session.get(url)
return resp.json()
# Usage
pihole = PiholeDNS("http://pihole.local", os.environ["PIHOLE_PASSWORD"])
pihole.add_local_domain("newservice.home.arpa", "192.168.1.100")

Now when I deploy a new container, the AI automatically adds the DNS entry.

Step 4: Nginx Proxy Manager Integration

The next step is automatic reverse proxy configuration.

npm_config.py
import requests
class NginxProxyManager:
def __init__(self, host, email, password):
self.host = host
self.email = email
self.password = password
self.session = requests.Session()
self._authenticate()
def _authenticate(self):
"""Authenticate with NPM"""
url = f"{self.host}/api/tokens"
data = {
"identity": self.email,
"secret": self.password
}
resp = self.session.post(url, json=data)
self.token = resp.json().get("token")
self.session.headers.update({
"Authorization": f"Bearer {self.token}"
})
def create_proxy_host(self, domain_names, forward_host, forward_port, ssl=True):
"""Create a new proxy host"""
url = f"{self.host}/api/nginx/proxy-hosts"
data = {
"domain_names": domain_names,
"forward_host": forward_host,
"forward_port": forward_port,
"certificate_id": None, # Add SSL cert ID if available
"ssl_forced": ssl,
"http2_support": True,
"block_exploits": True,
"caching_enabled": True
}
resp = self.session.post(url, json=data)
return resp.json()
# Usage
npm = NginxProxyManager(
"http://npm.local:81",
os.environ["NPM_EMAIL"],
os.environ["NPM_PASSWORD"]
)
npm.create_proxy_host(
domain_names=["newservice.home.arpa"],
forward_host="192.168.1.100",
forward_port=8080
)

Step 5: Ollama Watchdog

The most powerful component is the Ollama watchdog. It uses a local LLM to monitor services intelligently.

Install Ollama

Terminal
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a lightweight model
ollama pull llama3.2:3b
# Start Ollama service
ollama serve

Implement the Watchdog

ollama_watchdog.py
import requests
from datetime import datetime
class OllamaWatchdog:
def __init__(self, ollama_host="http://localhost:11434", model="llama3.2:3b"):
self.ollama_host = ollama_host
self.model = model
self.services = self.load_services()
def load_services(self):
"""Load list of services to monitor"""
return [
{"name": "plex", "url": "http://plex:32400", "check": "health"},
{"name": "pihole", "url": "http://pihole/admin", "check": "health"},
{"name": "npm", "url": "http://npm:81", "check": "health"},
{"name": "tandoor", "url": "http://tandoor:8080", "check": "health"},
]
def check_service_health(self, service):
"""Check if a service is responding"""
try:
resp = requests.get(service["url"], timeout=10)
return resp.status_code == 200
except:
return False
def analyze_with_ollama(self, prompt):
"""Use Ollama for intelligent analysis"""
url = f"{self.ollama_host}/api/generate"
data = {
"model": self.model,
"prompt": prompt,
"stream": False
}
resp = requests.post(url, json=data, timeout=60)
return resp.json().get("response", "")
def run_checks(self):
"""Run health checks on all services"""
results = []
problems = []
for service in self.services:
healthy = self.check_service_health(service)
results.append({
"service": service["name"],
"healthy": healthy,
"timestamp": datetime.now().isoformat()
})
if not healthy:
problems.append(service["name"])
# Only report if there are problems
if problems:
analysis_prompt = f"""
The following services are not responding: {', '.join(problems)}
Based on typical homelab configurations, what are the most likely causes
and recommended troubleshooting steps? Keep response under 200 words.
"""
analysis = self.analyze_with_ollama(analysis_prompt)
self.send_alert(problems, analysis)
return results
def send_alert(self, problems, analysis):
"""Send alert notification"""
message = f"""
## Homelab Watchdog Alert
**Time:** {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Problems:** {', '.join(problems)}
### AI Analysis
{analysis}
"""
print(message) # Replace with actual notification (Telegram, email, etc.)
if __name__ == "__main__":
watchdog = OllamaWatchdog()
watchdog.run_checks()

I set this to run every 30 minutes via cron:

Crontab
*/30 * * * * /usr/bin/python3 /opt/homelab/ollama_watchdog.py >> /var/log/watchdog.log 2>&1

Step 6: Automation Rules Configuration

I defined clear rules for what the AI can do automatically versus what needs approval:

automation_rules.yaml
automation_rules:
resource_optimization:
observation_period_days: 7
memory_threshold_percent: 25 # Alert if using <25% of allocated
cpu_threshold_percent: 20
action: "suggest" # Start with suggestions, not auto-apply
new_service_detection:
scan_interval_minutes: 5
auto_add_dns: true
auto_add_proxy: true
auto_add_ssh_key: false # Requires manual approval
watchdog:
check_interval_minutes: 30
alert_on:
- service_down
- memory_spike_percent: 50
- disk_space_percent: 90

Common Mistakes I Made

  1. Giving AI too much autonomy initially - I started with action: "suggest" mode before enabling auto-apply. This prevented accidental resource changes.

  2. Ignoring security boundaries - I created separate API tokens with limited scope for each service. Never use root tokens.

  3. No rollback plan - I now keep backups of configurations before any AI modifications.

  4. Over-alerting - I configured the watchdog to only report significant changes, not every minor fluctuation. Otherwise, I’d be flooded with notifications.

  5. Skipping observation period - I learned to let the AI observe for at least a week before making optimization suggestions. One day of data isn’t enough.

Results After One Month

After running this setup for a month:

MetricBeforeAfterImprovement
Memory allocated48GB36GB12GB recovered
Time per new service30 min5 min25 min saved
Problem detectionReactiveProactiveIssues caught early
NFS IOPS overheadHighOptimizedBetter performance

The biggest win is the 12GB of memory recovered from right-sizing containers. That’s enough to run two additional medium-sized services.

Summary

In this post, I showed how to automate homelab management with AI agents. The key components are:

  1. Proxmox API access for monitoring VM/container resources
  2. Pi-hole API for automatic DNS entries
  3. Nginx Proxy Manager API for reverse proxy configuration
  4. Ollama watchdog for intelligent health monitoring

The key point is starting with observation mode, verifying suggestions, then enabling automation. Let the AI watch your infrastructure for a week before accepting its optimization recommendations.

My homelab now manages itself. The AI handles routine configuration, suggests resource optimizations, and only alerts me when something actually needs attention.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments