How to Build an Autonomous AI Agent That Runs Scheduled Self-Improvement Cycles
I built an AI agent that was supposed to help me manage tasks. It worked fine for weeks—then I realized it was stuck in the same patterns, making the same mistakes, never getting better.
That’s when I stumbled on a Reddit post about an agent that “dreams” at night. The idea was simple but powerful: what if my agent could run autonomous improvement cycles during off-peak hours?
The Problem With Static Agents
Traditional AI agents are frozen in time. They execute tasks with fixed capabilities:
- They can’t discover new techniques on their own
- They don’t learn from their mistakes (unless you manually update prompts)
- They miss out on the explosion of AI research happening daily
- They require constant human intervention to improve
I needed something different. An agent that could:
- Scan for new AI research and tools autonomously
- Reflect on its own performance
- Research improvements
- Safely evaluate and stage changes
The Dream Cycle Concept
The Reddit post described a “dream cycle”—a scheduled autonomous process that runs when I’m asleep. During this time, the agent analyzes its performance, researches improvements, and prepares staged changes for review.
The poster mentioned something that caught my attention: their agent found research that made it better at researching. A virtuous cycle.
But I also saw a warning in the comments: one user’s agent “bricked itself in a botched upgrade.” That scared me. I needed safety rails.
My First Attempt: Simple Cron Job
I started with a basic approach—just a cron job that would run a script every night:
#!/bin/bash# My first naive implementation
# Run improvement cyclepython /opt/agent/improve.py
# Apply changesgit -C /opt/agent pull origin improvementssystemctl restart agentThis was a disaster waiting to happen. No staging. No rollback. No evaluation. I was lucky nothing broke.
Phase 1: Building the Scan System
I needed to start small. The first phase was just gathering information without making any changes.
import arxivfrom github import Github
class DreamScanner: """Scans sources for potential improvements."""
def scan_arxiv(self, query: str, max_results: int = 10): """Scan arXiv for relevant AI research papers.""" search = arxiv.Search( query=query, max_results=max_results, sort_by=arxiv.SortCriterion.SubmittedDate ) return list(search.results())
def scan_github_trending(self, language: str = "python"): """Scan GitHub trending for new tools.""" g = Github() repos = g.get_repos(since="weekly", language=language) return repos[:10]But this was expensive. I was using my best model for everything, even simple API calls. The first week cost me $15 just for scanning.
I realized I needed model routing—lighter models for scanning, powerful models only for reasoning.
The Cost Problem
I broke down my nightly costs:
Phase | Model | Calls | Cost/Night---------------|------------|-------|------------Scan arXiv | Haiku | 50 | $0.05Scan GitHub | Haiku | 30 | $0.03Reflect | Sonnet | 20 | $0.15Research | Opus | 5 | $0.12Evaluate | Opus | 3 | $0.05---------------|------------|-------|------------Total | | | ~$0.40With proper model routing, I got the cost down to $0.40/night. That’s $12/month for an agent that improves itself. Worth it.
Phase 2: Adding Reflection
Scanning alone wasn’t enough. The agent needed to analyze its own performance data.
from datetime import datetime, timedeltafrom typing import List, Dict
class PerformanceReflector: """Analyzes recent agent performance."""
def analyze_logs(self, days: int = 7) -> Dict: """Parse and analyze recent logs.""" logs = self._fetch_logs(since=datetime.now() - timedelta(days=days))
analysis = { "total_tasks": len(logs), "failures": [l for l in logs if l.status == "failed"], "slow_tasks": [l for l in logs if l.duration > threshold], "common_errors": self._extract_patterns(logs) }
return analysis
def identify_bottlenecks(self, analysis: Dict) -> List[str]: """Find areas for improvement.""" bottlenecks = []
if len(analysis["failures"]) > threshold: bottlenecks.append("high_failure_rate")
if analysis.get("avg_duration") > baseline: bottlenecks.append("performance_degradation")
return bottlenecksThis helped the agent understand what to improve, not just randomly apply changes.
The Safety Scare
After two weeks of smooth operation, I woke up to a broken agent. It had:
- Found a “performance improvement” on GitHub
- Applied it directly to its core logic
- Broken the task execution pipeline
Everything stopped working. I had to manually revert the changes. That’s when I implemented the SafeImprovement class:
import subprocessimport loggingfrom dataclasses import dataclass
@dataclassclass SafeImprovement: """Ensures improvements can be safely rolled back."""
improvement_id: str checkpoint: str = None
def __post_init__(self): self.checkpoint = self._create_checkpoint()
def _create_checkpoint(self) -> str: """Create a checkpoint before applying changes.""" result = subprocess.check_output([ "git", "-C", "/opt/agent", "rev-parse", "HEAD" ]).decode().strip()
logging.info(f"Created checkpoint: {result}") return result
def apply(self, changes: str) -> bool: """Apply changes with automatic rollback on failure.""" try: # Stage changes self._stage_changes(changes)
# Run tests if not self._verify_health(): self.rollback() return False
return True except Exception as e: logging.error(f"Improvement failed: {e}") self.rollback() return False
def rollback(self): """Rollback to the checkpoint.""" subprocess.run([ "git", "-C", "/opt/agent", "reset", "--hard", self.checkpoint ]) logging.info(f"Rolled back to {self.checkpoint}")
def _verify_health(self) -> bool: """Run test suite to verify changes.""" result = subprocess.run( ["pytest", "/opt/agent/tests/", "-v", "--tb=short"], capture_output=True ) return result.returncode == 0Now every improvement gets a checkpoint. If tests fail, automatic rollback. No more bricked agents.
The Full Dream Cycle Architecture
After several iterations, I arrived at this architecture:
┌─────────────────────────────────────────────────────────────────┐│ DREAM CYCLE FLOW │├─────────────────────────────────────────────────────────────────┤│ ││ 11:15 PM - Phase 1: SCAN ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ arXiv Papers │ │ GitHub Trending │ ││ │ (Haiku, $0.05) │ │ (Haiku, $0.03) │ ││ └────────┬────────┘ └────────┬────────┘ ││ │ │ ││ └──────────┬───────────┘ ││ ▼ ││ 11:45 PM - Phase 2: REFLECT ││ ┌─────────────────────────────────────┐ ││ │ Analyze logs, find bottlenecks │ ││ │ (Sonnet, $0.15) │ ││ └──────────────────┬──────────────────┘ ││ ▼ ││ 12:30 AM - Phase 3: RESEARCH ││ ┌─────────────────────────────────────┐ ││ │ Deep-dive into promising areas │ ││ │ (Opus, $0.12) │ ││ └──────────────────┬──────────────────┘ ││ ▼ ││ 1:00 AM - Phase 4: EVALUATE ││ ┌─────────────────────────────────────┐ ││ │ Score improvements, stage changes │ ││ │ (Opus, $0.05) │ ││ └──────────────────┬──────────────────┘ ││ ▼ ││ 4:00 AM - Phase 5: BUILD ││ ┌─────────────────────────────────────┐ ││ │ Run tests, create PR if passing │ ││ │ (Automated) │ ││ └─────────────────────────────────────┘ ││ ││ Total Cost: ~$0.40/night (~$12/month) ││ │└─────────────────────────────────────────────────────────────────┘Putting It All Together
The final DreamCycle class orchestrates everything:
from datetime import datetimefrom typing import List, Dictfrom dataclasses import dataclass
@dataclassclass Improvement: title: str description: str impact_score: float risk_level: str staged_path: str
class DreamCycle: """Orchestrates the full dream cycle."""
def __init__(self, config: dict): self.config = config # Route tasks to appropriate models self.scanner = ModelRouter("haiku") # Fast, cheap self.reflector = ModelRouter("sonnet") # Balanced self.researcher = ModelRouter("opus") # Capable, expensive
def run(self) -> List[Improvement]: """Execute the full dream cycle.""" improvements = []
# Phase 1: Scan sources scan_results = self._scan_sources() logging.info(f"Scanned {len(scan_results)} items")
# Phase 2: Reflect on performance reflection = self._reflect_on_performance() logging.info(f"Identified {len(reflection['bottlenecks'])} bottlenecks")
# Phase 3: Research promising areas research = self._research(scan_results, reflection) logging.info(f"Researched {len(research)} areas")
# Phase 4: Evaluate and stage changes improvements = self._evaluate_and_stage(research) logging.info(f"Staged {len(improvements)} improvements")
return improvements
def _scan_sources(self) -> List[Dict]: """Scan arXiv, GitHub, internal logs.""" results = []
# Scan arXiv for relevant papers papers = self.scanner.scan_arxiv( query="AI agents autonomous systems", max_results=20 ) results.extend(papers)
# Scan GitHub trending repos = self.scanner.scan_github(language="python") results.extend(repos)
return results
def _reflect_on_performance(self) -> Dict: """Analyze recent agent performance.""" return self.reflector.analyze_logs(days=7)
def _research(self, scan_results: List, reflection: Dict) -> List[Dict]: """Deep-dive into promising areas.""" return self.researcher.synthesize(scan_results, reflection)
def _evaluate_and_stage(self, research: List[Dict]) -> List[Improvement]: """Score and stage improvements safely.""" improvements = []
for item in research: score = self.researcher.score_impact(item) risk = self.researcher.assess_risk(item)
if score > self.config["min_impact_score"]: staged_path = self._create_staging_branch(item)
improvement = Improvement( title=item["title"], description=item["description"], impact_score=score, risk_level=risk, staged_path=staged_path ) improvements.append(improvement)
return improvements
def _create_staging_branch(self, item: Dict) -> str: """Create a staging branch for the improvement.""" branch_name = f"improvement/{datetime.now().strftime('%Y%m%d')}-{item['id']}"
subprocess.run([ "git", "-C", "/opt/agent", "checkout", "-b", branch_name ])
return branch_nameThe Cron Configuration
Finally, the cron jobs that trigger everything:
# /etc/cron.d/agent-dream-cycle
# Dream cycle runs at 11:15 PM daily15 23 * * * agent-user /opt/agent/scripts/dream_cycle.sh
# Build staged changes at 4 AM0 4 * * * agent-user /opt/agent/scripts/build_changes.shWhat I Learned
After running this for a month, here’s what actually happened:
Wins:
- The agent discovered a paper on “tool use optimization” and applied techniques that reduced task execution time by 23%
- It found a GitHub repo with better error handling patterns
- Cost stayed around $0.38-0.42/night consistently
Failures:
- One “improvement” was just removing comments to “optimize code”—not helpful
- Another tried to switch from async to sync execution (tests caught it)
- Some nights produced no improvements at all (which is fine)
The meta-learning moment: The most interesting thing happened around week three. The agent found research on “improving agent research capabilities.” It applied those techniques to its own research phase. Now it’s better at finding improvements.
This is the virtuous cycle I was hoping for.
Common Mistakes to Avoid
| Mistake | What Happens | Fix |
|---|---|---|
| No rollback mechanism | Agent breaks, manual recovery needed | Always checkpoint + automatic rollback |
| Using Opus for everything | $5/night instead of $0.40 | Route tasks to appropriate models |
| Applying changes directly | Catastrophic failures | Stage everything, require tests |
| No improvement budget | Runaway complexity | Set limits on changes per cycle |
| Skipping human review | Dangerous changes deployed | Require approval for high-risk items |
When to Use This Approach
This architecture works well when:
- You have a stable agent with good test coverage
- You can afford ~$12/month for autonomous improvement
- You want continuous evolution without constant manual updates
- You’re okay with staged changes requiring human approval
It doesn’t work when:
- Your agent is still in rapid development
- You need guarantees about behavior stability
- You don’t have automated tests
- Budget is extremely tight
Final Architecture
Here’s the complete system I ended up with:
┌─────────────────────────────────────────────────────────────────┐│ AUTONOMOUS AGENT SYSTEM │├─────────────────────────────────────────────────────────────────┤│ ││ PRODUCTION AGENT ││ ┌─────────────────────────────────────────────────────────┐ ││ │ - Task execution │ ││ │ - Performance logging │ ││ │ - Error tracking │ ││ └─────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ LOGS & METRICS │ ││ └─────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ DREAM CYCLE │ ││ │ │ ││ │ [Scan] → [Reflect] → [Research] → [Evaluate] → [Stage] │ ││ │ │ ││ │ Cost: ~$0.40/night │ ││ └─────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ STAGING AREA │ ││ │ │ ││ │ - Automated tests │ ││ │ - Human review queue │ ││ │ - Rollback checkpoints │ ││ └─────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ IMPROVED AGENT │ ││ └─────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Getting Started
If you want to build something similar:
- Start with scanning only. Don’t enable autonomous changes until you trust the scanning phase.
- Add tests first. The SafeImprovement class only works if you have tests to run.
- Use model routing. This is the key to keeping costs down.
- Stage everything. Never apply changes directly to production.
- Require human approval. At least for the first month.
The Reddit poster was right—agents can dream. But like any dream, sometimes they need supervision.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 My OpenClaw agent dreams at night — and wakes up smarter
- 👨💻 arXiv API for paper scanning
- 👨💻 GitHub Trending API
- 👨💻 OpenClaw Framework
- 👨💻 Claude Code CLI
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
The dream cycle approach transformed my static agent into something that evolves. It’s not perfect—I still review the staged changes every morning—but it’s discovered improvements I never would have found on my own. The key is balancing autonomy with safety. Start small, add safeguards early, and let the agent earn your trust over time.
Comments