Skip to content

How to Set Up and Run autoresearch on Your GPU?

Problem

I wanted to try Andrej Karpathy’s autoresearch framework on my RTX 4090. I cloned the repo, ran the install script, and got this:

First attempt
$ python run.py --config experiment.md
ERROR: torch not found with CUDA support
ERROR: CUDA runtime not detected
Please install PyTorch with CUDA support first.

My GPU was sitting idle while autoresearch couldn’t use it. Here’s how I fixed it.

Environment

  • GPU: NVIDIA RTX 4090 (24GB VRAM)
  • OS: Ubuntu 22.04
  • Python: 3.10
  • CUDA: 12.1
  • PyTorch: 2.1.0+cu121

Step 1: Check Your GPU First

Before installing anything, verify your GPU is visible:

Check GPU visibility
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off |
| 30% 45C P8 15W / 450W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

If you see this, your GPU is ready. If you get “command not found,” install NVIDIA drivers first.

Step 2: Clone the Repository

Clone autoresearch
$ git clone https://github.com/karpathy/autoresearch.git
$ cd autoresearch

Simple enough. But the tricky part comes next.

Step 3: Create Virtual Environment

I tried installing dependencies globally first. Bad idea:

Failed global install
$ pip install -r requirements.txt
ERROR: Cannot install torch with CUDA in global environment
HINT: Use a virtual environment

Create a clean venv:

Create venv
$ python -m venv venv
$ source venv/bin/activate

Step 4: Install PyTorch with CUDA (The Critical Step)

This is where I got stuck for 2 hours. I kept installing CPU-only PyTorch:

Wrong: CPU-only PyTorch
$ pip install torch torchvision torchaudio
# This installs CPU version by default!

Then when I ran autoresearch:

CUDA not detected
$ python run.py --config experiment.md
WARNING: CUDA not available, falling back to CPU
Training will be 50x slower...

The fix: install PyTorch with the correct CUDA index URL:

Correct: CUDA-enabled PyTorch
# For CUDA 12.1 (check your nvidia-smi output)
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

After installation, verify CUDA works:

Verify PyTorch CUDA
$ python -c "import torch; print(torch.cuda.is_available())"
True
$ python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce RTX 4090

If you see False, you installed the wrong PyTorch version. Reinstall with the correct CUDA index URL.

Step 5: Install Remaining Dependencies

Install requirements
$ pip install -r requirements.txt

The requirements.txt in autoresearch includes:

  • anthropic (for Claude API)
  • openai (for GPT-4 API)
  • matplotlib (for visualizations)
  • numpy, tqdm, etc.

Step 6: Configure API Keys

autoresearch needs an LLM to generate experiments. Set your API key:

Set API key
# Option 1: Claude (recommended)
$ export ANTHROPIC_API_KEY="sk-ant-your-key-here"
# Option 2: OpenAI
$ export OPENAI_API_KEY="sk-your-key-here"

Or add to your .bashrc for persistence:

Permanent API key
$ echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.bashrc
$ source ~/.bashrc

Step 7: Create Your First Experiment

autoresearch uses Markdown files to define experiments. Create one:

experiment.md
# Experiment: Character-Level Language Model
## Goal
Minimize validation loss for a character-level language model on Shakespeare dataset.
## Constraints
- Training time: 5 minutes per experiment
- Max parameters: 1 million
- Single GPU: RTX 4090
## Starting Architecture
- Embedding dimension: 64
- Number of attention heads: 4
- Transformer layers: 4
- Learning rate: 3e-4
## Success Metrics
- Validation perplexity < 1.5
- Training stability (no NaN losses)

The AI reads this and generates Python code automatically.

Step 8: Run autoresearch

Run experiment
$ python run.py --config experiment.md --timeout 8h

What happens:

autoresearch output
Initializing autoresearch...
GPU detected: NVIDIA GeForce RTX 4090
CUDA version: 12.1
VRAM available: 24GB
Experiment 1: Baseline model
Generated model.py (234 lines)
Training for 5 minutes...
Validation perplexity: 2.3
Decision: Keep as baseline
Experiment 2: Increase embedding dim to 128
Modified model architecture
Training for 5 minutes...
Validation perplexity: 1.8
Decision: Keep (improvement detected)
Experiment 3: Add dropout 0.2
Training for 5 minutes...
Validation perplexity: 1.9
Decision: Discard (no improvement)
...continuing autonomously...

The AI runs experiments, evaluates results, and decides whether to keep changes. It loops until you stop it or it reaches your goal.

Step 9: Monitor Progress

While autoresearch runs, I monitor:

Monitor GPU usage
$ watch -n 1 nvidia-smi

Shows real-time GPU utilization. During training, expect:

  • GPU-Util: 95-100%
  • Memory-Usage: 2-8GB (for small models)
  • Power: 200-300W

Check experiment logs:

Check logs
$ tail -f logs/experiment.log

Understanding the Output Structure

autoresearch creates a clear directory structure:

Output structure
autoresearch/
├── experiments/
│ ├── exp_001/
│ │ ├── model.py # Generated model code
│ │ ├── train.py # Training script
│ │ └── results.json # Metrics
│ ├── exp_002/
│ └── ...
├── logs/
│ └── experiment.log
├── checkpoints/
│ └── best_model.pt # Best model weights
└── results/
├── loss_curves.png # Visualization
└── summary.md # AI-generated analysis

Each experiment is versioned. You can reproduce any experiment by looking at its code.

What Went Wrong for Me

Mistake 1: Wrong CUDA Version

I installed CUDA 11.8 PyTorch when my driver supported CUDA 12.1:

Version mismatch
$ pip install torch --index-url https://download.pytorch.org/whl/cu118
# Works, but slower than cu121 on RTX 4090

The fix: match your nvidia-smi CUDA version.

Mistake 2: Too Large Model

I tried 10M parameters on my first run:

OOM error
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
GPU 0 has a total capacty of 24.00 GiB

The fix: start with 1M parameters, increase gradually.

Mistake 3: No API Key

Missing API key
$ python run.py --config experiment.md
ERROR: No API key configured
Set ANTHROPIC_API_KEY or OPENAI_API_KEY environment variable

The fix: set the environment variable before running.

Hardware Recommendations

From my testing:

GPUVRAMMax ParametersTraining Time
RTX 2080 Ti11GB500K5 min
RTX 308010GB1M5 min
RTX 409024GB5M5 min

For basic experiments, 10GB VRAM is enough. For larger models, 24GB+ helps.

How Many Experiments Can You Run?

With 5-minute training per experiment:

8 hours = 96 experiments
24 hours = 288 experiments
Weekend (48 hours) = 576 experiments

That’s the “100 experiments overnight” claim. It’s achievable on consumer GPUs.

Summary

In this post, I showed how to set up autoresearch on your GPU. The key steps:

  1. Check GPU visibility with nvidia-smi
  2. Create a virtual environment
  3. Install PyTorch with correct CUDA index URL (critical!)
  4. Set your LLM API key
  5. Write experiment goals in Markdown
  6. Run and monitor

The trickiest part was getting PyTorch with CUDA support. Once that works, autoresearch handles the rest autonomously.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments