Skip to content

How to Build an Autonomous AI Agent That Runs Scheduled Self-Improvement Cycles

I built an AI agent that was supposed to help me manage tasks. It worked fine for weeks—then I realized it was stuck in the same patterns, making the same mistakes, never getting better.

That’s when I stumbled on a Reddit post about an agent that “dreams” at night. The idea was simple but powerful: what if my agent could run autonomous improvement cycles during off-peak hours?

The Problem With Static Agents

Traditional AI agents are frozen in time. They execute tasks with fixed capabilities:

  • They can’t discover new techniques on their own
  • They don’t learn from their mistakes (unless you manually update prompts)
  • They miss out on the explosion of AI research happening daily
  • They require constant human intervention to improve

I needed something different. An agent that could:

  1. Scan for new AI research and tools autonomously
  2. Reflect on its own performance
  3. Research improvements
  4. Safely evaluate and stage changes

The Dream Cycle Concept

The Reddit post described a “dream cycle”—a scheduled autonomous process that runs when I’m asleep. During this time, the agent analyzes its performance, researches improvements, and prepares staged changes for review.

The poster mentioned something that caught my attention: their agent found research that made it better at researching. A virtuous cycle.

But I also saw a warning in the comments: one user’s agent “bricked itself in a botched upgrade.” That scared me. I needed safety rails.

My First Attempt: Simple Cron Job

I started with a basic approach—just a cron job that would run a script every night:

dream_cycle.sh
#!/bin/bash
# My first naive implementation
# Run improvement cycle
python /opt/agent/improve.py
# Apply changes
git -C /opt/agent pull origin improvements
systemctl restart agent

This was a disaster waiting to happen. No staging. No rollback. No evaluation. I was lucky nothing broke.

Phase 1: Building the Scan System

I needed to start small. The first phase was just gathering information without making any changes.

scanner.py
import arxiv
from github import Github
class DreamScanner:
"""Scans sources for potential improvements."""
def scan_arxiv(self, query: str, max_results: int = 10):
"""Scan arXiv for relevant AI research papers."""
search = arxiv.Search(
query=query,
max_results=max_results,
sort_by=arxiv.SortCriterion.SubmittedDate
)
return list(search.results())
def scan_github_trending(self, language: str = "python"):
"""Scan GitHub trending for new tools."""
g = Github()
repos = g.get_repos(since="weekly", language=language)
return repos[:10]

But this was expensive. I was using my best model for everything, even simple API calls. The first week cost me $15 just for scanning.

I realized I needed model routing—lighter models for scanning, powerful models only for reasoning.

The Cost Problem

I broke down my nightly costs:

cost-analysis.txt
Phase | Model | Calls | Cost/Night
---------------|------------|-------|------------
Scan arXiv | Haiku | 50 | $0.05
Scan GitHub | Haiku | 30 | $0.03
Reflect | Sonnet | 20 | $0.15
Research | Opus | 5 | $0.12
Evaluate | Opus | 3 | $0.05
---------------|------------|-------|------------
Total | | | ~$0.40

With proper model routing, I got the cost down to $0.40/night. That’s $12/month for an agent that improves itself. Worth it.

Phase 2: Adding Reflection

Scanning alone wasn’t enough. The agent needed to analyze its own performance data.

reflector.py
from datetime import datetime, timedelta
from typing import List, Dict
class PerformanceReflector:
"""Analyzes recent agent performance."""
def analyze_logs(self, days: int = 7) -> Dict:
"""Parse and analyze recent logs."""
logs = self._fetch_logs(since=datetime.now() - timedelta(days=days))
analysis = {
"total_tasks": len(logs),
"failures": [l for l in logs if l.status == "failed"],
"slow_tasks": [l for l in logs if l.duration > threshold],
"common_errors": self._extract_patterns(logs)
}
return analysis
def identify_bottlenecks(self, analysis: Dict) -> List[str]:
"""Find areas for improvement."""
bottlenecks = []
if len(analysis["failures"]) > threshold:
bottlenecks.append("high_failure_rate")
if analysis.get("avg_duration") > baseline:
bottlenecks.append("performance_degradation")
return bottlenecks

This helped the agent understand what to improve, not just randomly apply changes.

The Safety Scare

After two weeks of smooth operation, I woke up to a broken agent. It had:

  1. Found a “performance improvement” on GitHub
  2. Applied it directly to its core logic
  3. Broken the task execution pipeline

Everything stopped working. I had to manually revert the changes. That’s when I implemented the SafeImprovement class:

safe_improvement.py
import subprocess
import logging
from dataclasses import dataclass
@dataclass
class SafeImprovement:
"""Ensures improvements can be safely rolled back."""
improvement_id: str
checkpoint: str = None
def __post_init__(self):
self.checkpoint = self._create_checkpoint()
def _create_checkpoint(self) -> str:
"""Create a checkpoint before applying changes."""
result = subprocess.check_output([
"git", "-C", "/opt/agent", "rev-parse", "HEAD"
]).decode().strip()
logging.info(f"Created checkpoint: {result}")
return result
def apply(self, changes: str) -> bool:
"""Apply changes with automatic rollback on failure."""
try:
# Stage changes
self._stage_changes(changes)
# Run tests
if not self._verify_health():
self.rollback()
return False
return True
except Exception as e:
logging.error(f"Improvement failed: {e}")
self.rollback()
return False
def rollback(self):
"""Rollback to the checkpoint."""
subprocess.run([
"git", "-C", "/opt/agent", "reset", "--hard", self.checkpoint
])
logging.info(f"Rolled back to {self.checkpoint}")
def _verify_health(self) -> bool:
"""Run test suite to verify changes."""
result = subprocess.run(
["pytest", "/opt/agent/tests/", "-v", "--tb=short"],
capture_output=True
)
return result.returncode == 0

Now every improvement gets a checkpoint. If tests fail, automatic rollback. No more bricked agents.

The Full Dream Cycle Architecture

After several iterations, I arrived at this architecture:

dream-cycle-phases.txt
┌─────────────────────────────────────────────────────────────────┐
│ DREAM CYCLE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 11:15 PM - Phase 1: SCAN │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ arXiv Papers │ │ GitHub Trending │ │
│ │ (Haiku, $0.05) │ │ (Haiku, $0.03) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └──────────┬───────────┘ │
│ ▼ │
│ 11:45 PM - Phase 2: REFLECT │
│ ┌─────────────────────────────────────┐ │
│ │ Analyze logs, find bottlenecks │ │
│ │ (Sonnet, $0.15) │ │
│ └──────────────────┬──────────────────┘ │
│ ▼ │
│ 12:30 AM - Phase 3: RESEARCH │
│ ┌─────────────────────────────────────┐ │
│ │ Deep-dive into promising areas │ │
│ │ (Opus, $0.12) │ │
│ └──────────────────┬──────────────────┘ │
│ ▼ │
│ 1:00 AM - Phase 4: EVALUATE │
│ ┌─────────────────────────────────────┐ │
│ │ Score improvements, stage changes │ │
│ │ (Opus, $0.05) │ │
│ └──────────────────┬──────────────────┘ │
│ ▼ │
│ 4:00 AM - Phase 5: BUILD │
│ ┌─────────────────────────────────────┐ │
│ │ Run tests, create PR if passing │ │
│ │ (Automated) │ │
│ └─────────────────────────────────────┘ │
│ │
│ Total Cost: ~$0.40/night (~$12/month) │
│ │
└─────────────────────────────────────────────────────────────────┘

Putting It All Together

The final DreamCycle class orchestrates everything:

dream_cycle.py
from datetime import datetime
from typing import List, Dict
from dataclasses import dataclass
@dataclass
class Improvement:
title: str
description: str
impact_score: float
risk_level: str
staged_path: str
class DreamCycle:
"""Orchestrates the full dream cycle."""
def __init__(self, config: dict):
self.config = config
# Route tasks to appropriate models
self.scanner = ModelRouter("haiku") # Fast, cheap
self.reflector = ModelRouter("sonnet") # Balanced
self.researcher = ModelRouter("opus") # Capable, expensive
def run(self) -> List[Improvement]:
"""Execute the full dream cycle."""
improvements = []
# Phase 1: Scan sources
scan_results = self._scan_sources()
logging.info(f"Scanned {len(scan_results)} items")
# Phase 2: Reflect on performance
reflection = self._reflect_on_performance()
logging.info(f"Identified {len(reflection['bottlenecks'])} bottlenecks")
# Phase 3: Research promising areas
research = self._research(scan_results, reflection)
logging.info(f"Researched {len(research)} areas")
# Phase 4: Evaluate and stage changes
improvements = self._evaluate_and_stage(research)
logging.info(f"Staged {len(improvements)} improvements")
return improvements
def _scan_sources(self) -> List[Dict]:
"""Scan arXiv, GitHub, internal logs."""
results = []
# Scan arXiv for relevant papers
papers = self.scanner.scan_arxiv(
query="AI agents autonomous systems",
max_results=20
)
results.extend(papers)
# Scan GitHub trending
repos = self.scanner.scan_github(language="python")
results.extend(repos)
return results
def _reflect_on_performance(self) -> Dict:
"""Analyze recent agent performance."""
return self.reflector.analyze_logs(days=7)
def _research(self, scan_results: List, reflection: Dict) -> List[Dict]:
"""Deep-dive into promising areas."""
return self.researcher.synthesize(scan_results, reflection)
def _evaluate_and_stage(self, research: List[Dict]) -> List[Improvement]:
"""Score and stage improvements safely."""
improvements = []
for item in research:
score = self.researcher.score_impact(item)
risk = self.researcher.assess_risk(item)
if score > self.config["min_impact_score"]:
staged_path = self._create_staging_branch(item)
improvement = Improvement(
title=item["title"],
description=item["description"],
impact_score=score,
risk_level=risk,
staged_path=staged_path
)
improvements.append(improvement)
return improvements
def _create_staging_branch(self, item: Dict) -> str:
"""Create a staging branch for the improvement."""
branch_name = f"improvement/{datetime.now().strftime('%Y%m%d')}-{item['id']}"
subprocess.run([
"git", "-C", "/opt/agent", "checkout", "-b", branch_name
])
return branch_name

The Cron Configuration

Finally, the cron jobs that trigger everything:

cron
# /etc/cron.d/agent-dream-cycle
# Dream cycle runs at 11:15 PM daily
15 23 * * * agent-user /opt/agent/scripts/dream_cycle.sh
# Build staged changes at 4 AM
0 4 * * * agent-user /opt/agent/scripts/build_changes.sh

What I Learned

After running this for a month, here’s what actually happened:

Wins:

  • The agent discovered a paper on “tool use optimization” and applied techniques that reduced task execution time by 23%
  • It found a GitHub repo with better error handling patterns
  • Cost stayed around $0.38-0.42/night consistently

Failures:

  • One “improvement” was just removing comments to “optimize code”—not helpful
  • Another tried to switch from async to sync execution (tests caught it)
  • Some nights produced no improvements at all (which is fine)

The meta-learning moment: The most interesting thing happened around week three. The agent found research on “improving agent research capabilities.” It applied those techniques to its own research phase. Now it’s better at finding improvements.

This is the virtuous cycle I was hoping for.

Common Mistakes to Avoid

MistakeWhat HappensFix
No rollback mechanismAgent breaks, manual recovery neededAlways checkpoint + automatic rollback
Using Opus for everything$5/night instead of $0.40Route tasks to appropriate models
Applying changes directlyCatastrophic failuresStage everything, require tests
No improvement budgetRunaway complexitySet limits on changes per cycle
Skipping human reviewDangerous changes deployedRequire approval for high-risk items

When to Use This Approach

This architecture works well when:

  • You have a stable agent with good test coverage
  • You can afford ~$12/month for autonomous improvement
  • You want continuous evolution without constant manual updates
  • You’re okay with staged changes requiring human approval

It doesn’t work when:

  • Your agent is still in rapid development
  • You need guarantees about behavior stability
  • You don’t have automated tests
  • Budget is extremely tight

Final Architecture

Here’s the complete system I ended up with:

system-architecture.txt
┌─────────────────────────────────────────────────────────────────┐
│ AUTONOMOUS AGENT SYSTEM │
├─────────────────────────────────────────────────────────────────┤
│ │
│ PRODUCTION AGENT │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Task execution │ │
│ │ - Performance logging │ │
│ │ - Error tracking │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ LOGS & METRICS │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ DREAM CYCLE │ │
│ │ │ │
│ │ [Scan] → [Reflect] → [Research] → [Evaluate] → [Stage] │ │
│ │ │ │
│ │ Cost: ~$0.40/night │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ STAGING AREA │ │
│ │ │ │
│ │ - Automated tests │ │
│ │ - Human review queue │ │
│ │ - Rollback checkpoints │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ IMPROVED AGENT │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Getting Started

If you want to build something similar:

  1. Start with scanning only. Don’t enable autonomous changes until you trust the scanning phase.
  2. Add tests first. The SafeImprovement class only works if you have tests to run.
  3. Use model routing. This is the key to keeping costs down.
  4. Stage everything. Never apply changes directly to production.
  5. Require human approval. At least for the first month.

The Reddit poster was right—agents can dream. But like any dream, sometimes they need supervision.


Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

The dream cycle approach transformed my static agent into something that evolves. It’s not perfect—I still review the staged changes every morning—but it’s discovered improvements I never would have found on my own. The key is balancing autonomy with safety. Start small, add safeguards early, and let the agent earn your trust over time.

Comments