What Free LLM Models Actually Work for Scheduled Tasks and Cron Jobs?
I was running 15 cron jobs with ChatGPT Plus and hit the weekly limit by Wednesday. My scheduled tasks—log analysis, report generation, alert summarization—started failing silently. That’s when I realized paid API services don’t scale well for automation.
Here’s what I learned after switching to free local LLMs for cron jobs.
The Problem with Paid LLM APIs for Automation
I had a simple setup: multiple cron jobs that would call the ChatGPT API to process data at regular intervals. Each job was small—maybe 500-1000 tokens per request. But 15 jobs running multiple times per day added up fast.
Monday: ████████░░░░░░░░░░░░ 40%Tuesday: ██████████████░░░░░░ 70%Wednesday: ████████████████████ 100% (LIMIT HIT)Thursday: ░░░░░░░░░░░░░░░░░░░░ 0% (BLOCKED)The errors started appearing in my logs:
[Wed 10:00] Cron job 'daily_summary' failed: Rate limit exceeded[Wed 10:30] Cron job 'log_analyzer' failed: Rate limit exceeded[Wed 11:00] Cron job 'alert_checker' failed: Rate limit exceededEven with a paid subscription, I was hitting caps designed for interactive use, not automated batch processing.
The Local LLM Alternative
I started experimenting with Ollama, a tool that runs LLMs locally on my machine. The appeal was obvious: no rate limits, no per-token costs, and no external dependencies.
# Install Ollamacurl -fsSL https://ollama.com/install.sh | sh
# Pull a modelollama pull qwen2.5:7bMy first attempt was with Qwen 2.5 7B, a smaller model that could run on my 16GB RAM machine. I set up a simple cron job:
# Test basic cron integration*/30 * * * * ollama run qwen2.5:7b "Summarize in one sentence: $(tail -20 /var/log/syslog)" >> /var/log/llm_summary.logIt worked. But the quality was inconsistent for complex tasks.
Which Models Actually Work?
After testing several models, I found clear differences in their suitability for automation tasks:
Model | Size | Speed | Quality | Best For---------------|-------|-------|---------|------------------Kimi K2.5 | ~30B | Slow | High | Complex reasoningGLM 5 | ~9B | Fast | Medium | Structured tasksQwen 2.5 7B | 7B | Fast | Medium | Simple extractionQwen 2.5 397B | 397B | V.Slow| V.High | One-off analysisMinimax | ~30B | Med | High | General purposeKimi K2.5: Best for Complex Automation
Kimi excelled at tasks requiring reasoning:
import subprocessimport json
def analyze_metrics(metrics_path): """Run Kimi on metrics data for anomaly detection.""" with open(metrics_path) as f: data = f.read()
prompt = f"""Analyze these metrics and identify any anomalies. Return JSON with 'status', 'anomalies', and 'recommendations'.
Metrics: {data} """
result = subprocess.run( ["ollama", "run", "kimi", prompt], capture_output=True, text=True, timeout=300 # 5 min timeout )
return json.loads(result.stdout)
# Cron job calls this every hourif __name__ == "__main__": result = analyze_metrics("/data/metrics.json") if result["status"] == "alert": send_alert(result["anomalies"])The tradeoff: Kimi is slow. A typical request takes 30-60 seconds on my hardware. For cron jobs running hourly or daily, this is acceptable.
GLM 5: Reliable for Structured Tasks
GLM 5 surprised me with its consistency for predictable tasks:
import subprocessfrom datetime import datetime
def generate_daily_report(log_path, output_path): """Generate structured daily report from logs."""
with open(log_path) as f: logs = f.read()
prompt = f"""Generate a daily report from these logs. Use this exact format:
## Daily Report - [DATE] - Errors: [COUNT] - Warnings: [COUNT] - Top Issues: [BULLETED LIST] - Recommendations: [BULLETED LIST]
Logs: {logs[-5000:]} # Last 5000 chars """
result = subprocess.run( ["ollama", "run", "glm", prompt], capture_output=True, text=True )
report = result.stdout.replace("[DATE]", datetime.now().strftime("%Y-%m-%d"))
with open(output_path, "w") as f: f.write(report)
# Crontab: 0 9 * * * python3 /scripts/cron_glm.pyGLM is faster—usually 5-15 seconds per request—and more consistent in following output formats.
Qwen 2.5: Flexible Size Options
Qwen offers the most flexibility:
Size | RAM Needed | Speed | Use Case-------|------------|------------|---------------------------7B | 8GB | Very Fast | Simple extraction, formatting14B | 16GB | Fast | Moderate complexity32B | 32GB | Medium | Balanced tasks397B | 128GB+ | Very Slow | Maximum quality (not for cron)For most cron jobs, the 7B or 14B variants are sufficient:
name: Daily AI Reporton: schedule: - cron: '0 9 * * *'
jobs: report: runs-on: self-hosted # Needs Ollama installed steps: - name: Generate Report run: | curl -s http://localhost:11434/api/generate -d '{ "model": "qwen2.5:7b", "prompt": "Create summary from '"$(cat data.json)"'", "stream": false }' | jq -r .response > report.mdThe Speed Tradeoff
The main difference from cloud APIs is speed. Here’s a realistic comparison:
Service | Avg Response | P99 Response | Reliability-------------|--------------|--------------|------------ChatGPT API | 2-5 sec | 10 sec | 99.9%Claude API | 3-7 sec | 15 sec | 99.9%Kimi (local) | 30-60 sec | 120 sec | 99.5%GLM (local) | 5-15 sec | 45 sec | 99.8%Qwen 7B | 3-10 sec | 30 sec | 99.9%For cron jobs, this matters less. My daily report generation doesn’t care if it takes 30 seconds or 3 seconds—it just needs to complete before I check my inbox.
Setting Up a Robust Cron + LLM Pipeline
After several iterations, I landed on this architecture:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐│ Cron │────▶│ LLM Worker │────▶│ Notifier ││ Scheduler │ │ (Ollama) │ │ (Email/Slack)│└─────────────┘ └──────────────┘ └─────────────┘ │ │ ▼ ▼┌─────────────┐ ┌──────────────┐│ Retry │ │ Fallback ││ Queue │ │ (Smaller ││ │ │ Model) │└─────────────┘ └──────────────┘The Retry Mechanism
Local LLMs can fail—GPU memory issues, model loading problems, or just timeouts. I built retry logic:
import subprocessimport timeimport loggingfrom typing import Optional
logging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)
MODELS = { "primary": "kimi", # Best quality "fallback": "glm", # Faster backup "minimal": "qwen2.5:7b" # Last resort}
def run_with_retry(prompt: str, max_retries: int = 3) -> Optional[str]: """Run LLM with fallback models on failure."""
for attempt in range(max_retries): # Try models in priority order for model_tier, model_name in MODELS.items(): try: logger.info(f"Attempt {attempt+1} with {model_name}")
result = subprocess.run( ["ollama", "run", model_name, prompt], capture_output=True, text=True, timeout=180 # 3 min timeout )
if result.returncode == 0 and result.stdout.strip(): logger.info(f"Success with {model_name}") return result.stdout
except subprocess.TimeoutExpired: logger.warning(f"Timeout with {model_name}") continue except Exception as e: logger.error(f"Error with {model_name}: {e}") continue
# Wait before retrying if attempt < max_retries - 1: time.sleep(10)
logger.error("All models failed") return None
# Example cron usageif __name__ == "__main__": result = run_with_retry("Summarize today's server logs: ...") if result: save_result(result) else: alert_on_failure()Health Checks
I added a monitoring job to track model availability:
import subprocessimport jsonfrom datetime import datetime
def check_ollama_health(): """Check if Ollama and models are responsive.""" try: # Check Ollama is running result = subprocess.run( ["ollama", "list"], capture_output=True, text=True, timeout=5 )
models = result.stdout
# Quick test each model health = {} for model in ["kimi", "glm", "qwen2.5:7b"]: try: test = subprocess.run( ["ollama", "run", model, "test"], capture_output=True, text=True, timeout=30 ) health[model] = test.returncode == 0 except: health[model] = False
return { "timestamp": datetime.now().isoformat(), "ollama_running": True, "models": health } except: return {"ollama_running": False}
# Run every 5 minutes: */5 * * * * python3 health_check.pyWhat I Learned
After running this setup for a month:
-
Local LLMs are reliable enough for automation. I’ve had 99%+ uptime with proper retry logic.
-
Model selection matters. Kimi for complex reasoning, GLM for structured tasks, Qwen for quick jobs.
-
Speed is acceptable. My cron jobs run at fixed intervals. Whether they complete in 5 seconds or 50 seconds rarely matters.
-
Zero rate limits changes behavior. I no longer worry about how many jobs I run. I added more automation because the marginal cost is zero.
-
Privacy is a bonus. My logs and data never leave my machine.
Common Mistakes to Avoid
I made these mistakes so you don’t have to:
-
Using large models for simple tasks. Don’t use a 397B model for log parsing. Match model size to task complexity.
-
No timeout handling. Local inference can hang. Always set timeouts.
-
No fallback models. When your primary model fails (and it will), have backups ready.
-
Ignoring hardware limits. Running out of GPU memory mid-inference crashes jobs. Monitor resources.
-
Assuming all “free” models are equal. Quality varies dramatically. Test before deploying to production cron jobs.
When to Stick with Cloud APIs
Local LLMs aren’t always the answer. Keep cloud APIs for:
- Real-time user-facing features (latency matters)
- Mobile or low-power devices
- Tasks requiring the absolute best model quality
- One-off tasks (not worth setting up local inference)
Summary
For scheduled tasks and cron jobs, free local LLMs through Ollama offer a compelling alternative to paid APIs. The key is matching model to task: Kimi K2.5 for complex reasoning, GLM 5 for structured work, and Qwen 2.5 for flexibility.
The main tradeoff is speed, but for automation that runs unattended, it rarely matters. What matters is that my 15 cron jobs now run without limits, without cost, and without worrying about hitting a Wednesday API cap.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 OpenClaw - Open Source Claude Alternative
- 👨💻 Ollama - Run LLMs Locally
- 👨💻 Reddit Discussion on Free LLM for Cron Jobs
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments