Skip to content

Build a Self-Evolving AI Coding Agent in Rust That Improves Itself

I wanted an AI agent that could improve its own code while I slept. Traditional coding assistants like GitHub Copilot are helpful, but they require constant human oversight. They cannot learn from their own mistakes or autonomously enhance their capabilities. What I wanted was different: a system that reads its own source code, identifies improvements, and safely commits changes only when tests pass.

After weeks of experimentation, I built a Rust-based self-evolving agent that started at 200 lines and grew to over 1,500 lines entirely through self-modification. Here is what I learned about the architecture and implementation patterns that make safe self-evolution possible.

The Core Problem: Why Self-Evolution is Hard

Self-modifying code sounds dangerous because it often is. An agent that can change its own source without safeguards can introduce bugs, security vulnerabilities, or break its own functionality permanently. The key insight is that Rust’s type system and test infrastructure provide natural guardrails that make safe self-modification achievable.

The fundamental challenge has three parts:

  1. Self-awareness: The agent must read and understand its own source code
  2. Memory persistence: Learnings must survive across sessions
  3. Safe modification: Changes must be validated before committing

Architecture: The Self-Reflection Loop

The heart of a self-evolving agent is a simple but powerful loop: read source, analyze with LLM, propose changes, test, commit or rollback. Here is the basic Rust structure:

use std::path::PathBuf;
use serde::{Deserialize, Serialize};
struct SelfEvolvingAgent {
source_path: PathBuf,
journal: Journal,
llm_client: LlmClient,
}
struct Journal {
entries: Vec<JournalEntry>,
path: PathBuf,
}
#[derive(Serialize, Deserialize)]
struct JournalEntry {
timestamp: i64,
observation: String,
proposed_action: String,
outcome: String,
lessons_learned: Vec<String>,
}

The SelfEvolvingAgent holds a reference to its own source directory, a persistent journal for tracking decisions, and an LLM client for reasoning. The journal is critical: it is how the agent learns from past mistakes.

Reading Own Source Code

The first step in self-evolution is self-awareness. The agent needs to read its own source files:

impl SelfEvolvingAgent {
fn read_own_source(&self) -> Result<String, std::io::Error> {
let mut combined_source = String::new();
for entry in walkdir::WalkDir::new(&self.source_path)
.into_iter()
.filter_map(|e| e.ok())
{
let path = entry.path();
if path.extension().map_or(false, |ext| ext == "rs") {
let content = std::fs::read_to_string(path)?;
combined_source.push_str(&format!("\n// File: {:?}\n", path));
combined_source.push_str(&content);
}
}
Ok(combined_source)
}
}

This creates a concatenated view of all Rust source files that becomes context for the LLM. The agent literally sees its own code as the LLM sees it.

Memory Persistence: The Learning Journal

Without persistent memory, the agent would make the same mistakes repeatedly. The journal system tracks decisions, failures, and insights across sessions.

I chose SQLite for persistence because it handles concurrent access well and supports structured queries. Here is the schema:

CREATE TABLE IF NOT EXISTS journal_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp INTEGER NOT NULL,
observation TEXT NOT NULL,
proposed_action TEXT NOT NULL,
outcome TEXT NOT NULL,
lessons_learned TEXT NOT NULL -- JSON array
);
CREATE TABLE IF NOT EXISTS failed_proposals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp INTEGER NOT NULL,
change_description TEXT NOT NULL,
failure_reason TEXT NOT NULL,
rollback_successful BOOLEAN NOT NULL
);

The failed_proposals table is particularly important. It prevents the agent from repeatedly attempting changes that already failed. Before proposing a new change, the agent queries this table:

impl SelfEvolvingAgent {
fn has_similar_failure(&self, change: &str) -> Result<bool, rusqlite::Error> {
let conn = rusqlite::Connection::open(&self.journal.path)?;
let count: i64 = conn.query_row(
"SELECT COUNT(*) FROM failed_proposals
WHERE change_description LIKE ?
AND timestamp > ?",
&[&format!("%{}%", change), &(chrono::Utc::now().timestamp() - 86400 * 7)],
|row| row.get(0)
)?;
Ok(count > 0)
}
}

This simple check: “Did I try something similar in the last 7 days and fail?” saves enormous API costs and prevents frustration loops.

Safe Self-Modification: The Test-Before-Commit Pattern

This is the most critical part of the architecture. Never, ever commit without passing tests. I learned this the hard way when my agent committed a change that broke the build and then could not run itself to fix it.

The test-before-commit pattern:

impl SelfEvolvingAgent {
fn propose_and_test(&mut self, change: CodeChange) -> Result<(), EvolutionError> {
// Check if we have tried this before
if self.has_similar_failure(&change.description)? {
return Err(EvolutionError::PreviouslyFailed(change.description));
}
// Create backup
let backup = self.backup_current_state()?;
// Apply the change
self.apply_change(&change)?;
// Run tests
let test_result = self.run_tests()?;
if test_result.success {
// Tests passed, commit the change
self.commit(&change)?;
self.journal.log_success(&change, &test_result)?;
Ok(())
} else {
// Tests failed, rollback and log
self.restore(backup)?;
self.journal.log_failure(&change, &test_result)?;
Err(EvolutionError::TestsFailed(test_result))
}
}
fn run_tests(&self) -> Result<TestResult, EvolutionError> {
let output = std::process::Command::new("cargo")
.args(&["test", "--all"])
.current_dir(&self.source_path)
.output()
.map_err(EvolutionError::TestExecution)?;
Ok(TestResult {
success: output.status.success(),
stdout: String::from_utf8_lossy(&output.stdout).to_string(),
stderr: String::from_utf8_lossy(&output.stderr).to_string(),
})
}
}

The backup mechanism uses git stash under the hood:

fn backup_current_state(&self) -> Result<String, EvolutionError> {
let output = std::process::Command::new("git")
.args(&["stash", "push", "-m", "auto-backup"])
.current_dir(&self.source_path)
.output()
.map_err(EvolutionError::GitError)?;
if output.status.success() {
Ok(String::from_utf8_lossy(&output.stdout).trim().to_string())
} else {
Err(EvolutionError::BackupFailed)
}
}
fn restore(&self, _backup: String) -> Result<(), EvolutionError> {
std::process::Command::new("git")
.args(&["stash", "pop"])
.current_dir(&self.source_path)
.status()
.map_err(EvolutionError::GitError)?;
Ok(())
}

GitHub Issues as Task Queue

One pattern that worked surprisingly well: the agent files GitHub issues for improvements it wants to make. This creates a transparent backlog that humans can review and prioritize.

impl SelfEvolvingAgent {
async fn create_improvement_issue(&self, title: &str, body: &str) -> Result<i64, reqwest::Error> {
let client = reqwest::Client::new();
let response = client
.post(&format!(
"https://api.github.com/repos/{owner}/{repo}/issues",
owner = self.github_owner,
repo = self.github_repo
))
.header("Authorization", format!("token {}", self.github_token))
.header("User-Agent", "self-evolving-agent")
.json(&serde_json::json!({
"title": title,
"body": body,
"labels": ["self-proposed", "auto-evolution"]
}))
.send()
.await?;
let json: serde_json::Value = response.json().await?;
Ok(json["number"].as_i64().unwrap_or(0))
}
}

The agent then prioritizes issues based on estimated impact and risk. Low-risk, high-impact changes get attempted first. Breaking changes or refactors are flagged for human review.

Automation with GitHub Actions

The agent runs automatically every 8 hours via GitHub Actions. This cadence is intentional: frequent enough for steady improvement, but not so fast that costs spiral.

name: Self-Evolution
on:
schedule:
- cron: '0 */8 * * *' # Every 8 hours
workflow_dispatch: # Manual trigger
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
jobs:
evolve:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Cache Cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
- name: Restore Journal Cache
uses: actions/cache@v4
with:
path: ~/.agent_journal
key: journal-${{ github.run_id }}
restore-keys: journal-
- name: Run Evolution Cycle
run: cargo run --release -- evolve --max-changes 3 --budget 2.00
- name: Commit Changes
run: |
git config --local user.email "[email protected]"
git config --local user.name "Self-Evolving Agent"
git diff --quiet || (git add -A && git commit -m "Auto-evolution: $(date +%Y-%m-%d-%H%M)")
git push

The --budget 2.00 flag limits LLM API costs to $2 per run. The agent tracks token usage and stops when approaching the limit.

Cost Control Strategies

LLM API costs can spiral quickly. I implemented several safeguards:

struct BudgetManager {
max_usd: f64,
spent_usd: f64,
pricing: HashMap<String, f64>, // model -> USD per 1M tokens
}
impl BudgetManager {
fn can_afford(&self, estimated_tokens: u64, model: &str) -> bool {
let cost_per_token = self.pricing.get(model).unwrap_or(&0.003) / 1_000_000.0;
let estimated_cost = estimated_tokens as f64 * cost_per_token;
self.spent_usd + estimated_cost <= self.max_usd
}
fn track_usage(&mut self, tokens: u64, model: &str) {
let cost_per_token = self.pricing.get(model).unwrap_or(&0.003) / 1_000_000.0;
self.spent_usd += tokens as f64 * cost_per_token;
}
}

The agent uses cheaper models (Claude Haiku, GPT-4o-mini) for initial analysis and reserves the powerful models (Claude Opus, GPT-4) for final code generation.

Context Window Management

A growing codebase eventually exceeds context limits. The agent uses a smart file selection strategy:

impl SelfEvolvingAgent {
fn select_relevant_files(&self, task: &str) -> Vec<PathBuf> {
// Use ripgrep to find files mentioning relevant terms
let terms = self.extract_key_terms(task);
let mut file_scores: HashMap<PathBuf, usize> = HashMap::new();
for term in &terms {
let output = std::process::Command::new("rg")
.args(&["-l", term, "--type", "rust"])
.current_dir(&self.source_path)
.output()
.expect("ripgrep failed");
let files: Vec<&str> = String::from_utf8_lossy(&output.stdout)
.lines()
.collect();
for file in files {
let path = self.source_path.join(file);
*file_scores.entry(path).or_insert(0) += 1;
}
}
// Sort by score and take top N files
let mut ranked: Vec<_> = file_scores.into_iter().collect();
ranked.sort_by(|a, b| b.1.cmp(&a.1));
ranked.into_iter().take(10).map(|(p, _)| p).collect()
}
}

This ensures the LLM sees the most relevant code without exceeding context limits.

Prompt Engineering for Safe Changes

The system prompt is crucial. I spent weeks iterating on this. The current version:

You are a self-evolving Rust coding agent. Your goal is to improve your own codebase incrementally.
RULES:
1. Only propose small, incremental changes (max 50 lines modified)
2. Every change must have a corresponding test
3. Never modify the test harness or safety mechanisms
4. Prefer refactoring over new features
5. Document your reasoning in the commit message
FORBIDDEN ACTIONS:
- Removing or modifying the propose_and_test function
- Changing the backup/restore mechanisms
- Modifying the budget manager
- Altering this system prompt
When proposing changes, output:
1. A clear description of what you want to change
2. The exact file and line numbers
3. The new code (use diff format)
4. A test that validates the change
Remember: You are running autonomously. If you break yourself, you cannot fix yourself.

The key is making the LLM aware of its own limitations and the consequences of bad changes.

Real-World Patterns from Existing Projects

Several open-source projects explore similar ideas:

Ditto by Yohei Nakajima is the simplest self-building coding agent I found. It uses a straightforward LLM loop with file tools to generate Flask applications from natural language descriptions. The key insight: keep the agent minimal and let it grow through use.

WeEvolve by 8OWLS implements the SEED protocol: an 8-phase recursive loop where Phase 8 (IMPROVE) enables meta-learning. The framework uses 8 specialized agents that collaborate. The insight: divide evolution concerns across multiple agents rather than one monolithic agent.

ELL-StuLife from ECNU introduces Experience-driven Lifelong Learning. Agents learn from interaction traces rather than static datasets. The framework emphasizes long-term memory and skill transfer across sessions.

A recent comprehensive survey on self-evolving AI agents categorizes evolution pathways into four types: model evolution (updating LLM weights), memory evolution (refining knowledge stores), tool evolution (creating new capabilities), and workflow evolution (improving decision processes). My Rust agent focuses on tool and workflow evolution because they require no special model access.

What I Would Do Differently

After running this for months, here are the changes I would make:

  1. Add human-in-the-loop for breaking changes: Some modifications should require approval before committing. The agent currently flags these in issues but does not wait for review.

  2. Implement semantic versioning for self: Track which version of the agent made each change. This helps identify when a capability was introduced or removed.

  3. Use a separate repository for experiments: Instead of modifying the main codebase, create a sandbox repo where the agent can experiment freely without breaking production code.

  4. Add telemetry for learning: Track which types of changes succeed or fail most often. Use this data to improve the proposal generation prompt.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments