How to Set Up and Run autoresearch on Your GPU?

Mar 29, 2026

Problem

I wanted to try Andrej Karpathy’s autoresearch framework on my RTX 4090. I cloned the repo, ran the install script, and got this:

$ python run.py --config experiment.md
ERROR: torch not found with CUDA support
ERROR: CUDA runtime not detected
Please install PyTorch with CUDA support first.

My GPU was sitting idle while autoresearch couldn’t use it. Here’s how I fixed it.

Environment

GPU: NVIDIA RTX 4090 (24GB VRAM)
OS: Ubuntu 22.04
Python: 3.10
CUDA: 12.1
PyTorch: 2.1.0+cu121

Step 1: Check Your GPU First

Before installing anything, verify your GPU is visible:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05   Driver Version: 535.104.05   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  Off |
| 30%   45C    P8    15W / 450W |      0MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

If you see this, your GPU is ready. If you get “command not found,” install NVIDIA drivers first.

Step 2: Clone the Repository

$ git clone https://github.com/karpathy/autoresearch.git
$ cd autoresearch

Simple enough. But the tricky part comes next.

Step 3: Create Virtual Environment

I tried installing dependencies globally first. Bad idea:

$ pip install -r requirements.txt
ERROR: Cannot install torch with CUDA in global environment
HINT: Use a virtual environment

Create a clean venv:

$ python -m venv venv
$ source venv/bin/activate

Step 4: Install PyTorch with CUDA (The Critical Step)

This is where I got stuck for 2 hours. I kept installing CPU-only PyTorch:

$ pip install torch torchvision torchaudio
# This installs CPU version by default!

Then when I ran autoresearch:

$ python run.py --config experiment.md
WARNING: CUDA not available, falling back to CPU
Training will be 50x slower...

The fix: install PyTorch with the correct CUDA index URL:

# For CUDA 12.1 (check your nvidia-smi output)
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

After installation, verify CUDA works:

$ python -c "import torch; print(torch.cuda.is_available())"
True

$ python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce RTX 4090

If you see False, you installed the wrong PyTorch version. Reinstall with the correct CUDA index URL.

Step 5: Install Remaining Dependencies

$ pip install -r requirements.txt

The requirements.txt in autoresearch includes:

anthropic (for Claude API)
openai (for GPT-4 API)
matplotlib (for visualizations)
numpy, tqdm, etc.

Step 6: Configure API Keys

autoresearch needs an LLM to generate experiments. Set your API key:

# Option 1: Claude (recommended)
$ export ANTHROPIC_API_KEY="sk-ant-your-key-here"

# Option 2: OpenAI
$ export OPENAI_API_KEY="sk-your-key-here"

Or add to your .bashrc for persistence:

$ echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.bashrc
$ source ~/.bashrc

Step 7: Create Your First Experiment

autoresearch uses Markdown files to define experiments. Create one:

# Experiment: Character-Level Language Model

## Goal
Minimize validation loss for a character-level language model on Shakespeare dataset.

## Constraints
- Training time: 5 minutes per experiment
- Max parameters: 1 million
- Single GPU: RTX 4090

## Starting Architecture
- Embedding dimension: 64
- Number of attention heads: 4
- Transformer layers: 4
- Learning rate: 3e-4

## Success Metrics
- Validation perplexity < 1.5
- Training stability (no NaN losses)

The AI reads this and generates Python code automatically.

Step 8: Run autoresearch

$ python run.py --config experiment.md --timeout 8h

What happens:

Initializing autoresearch...
GPU detected: NVIDIA GeForce RTX 4090
CUDA version: 12.1
VRAM available: 24GB

Experiment 1: Baseline model
  Generated model.py (234 lines)
  Training for 5 minutes...
  Validation perplexity: 2.3
  Decision: Keep as baseline

Experiment 2: Increase embedding dim to 128
  Modified model architecture
  Training for 5 minutes...
  Validation perplexity: 1.8
  Decision: Keep (improvement detected)

Experiment 3: Add dropout 0.2
  Training for 5 minutes...
  Validation perplexity: 1.9
  Decision: Discard (no improvement)

...continuing autonomously...

The AI runs experiments, evaluates results, and decides whether to keep changes. It loops until you stop it or it reaches your goal.

Step 9: Monitor Progress

While autoresearch runs, I monitor:

$ watch -n 1 nvidia-smi

Shows real-time GPU utilization. During training, expect:

GPU-Util: 95-100%
Memory-Usage: 2-8GB (for small models)
Power: 200-300W

Check experiment logs:

$ tail -f logs/experiment.log

Understanding the Output Structure

autoresearch creates a clear directory structure:

autoresearch/
├── experiments/
│   ├── exp_001/
│   │   ├── model.py       # Generated model code
│   │   ├── train.py       # Training script
│   │   └── results.json   # Metrics
│   ├── exp_002/
│   └── ...
├── logs/
│   └── experiment.log
├── checkpoints/
│   └── best_model.pt      # Best model weights
└── results/
    ├── loss_curves.png    # Visualization
    └── summary.md         # AI-generated analysis

Each experiment is versioned. You can reproduce any experiment by looking at its code.

What Went Wrong for Me

Mistake 1: Wrong CUDA Version

I installed CUDA 11.8 PyTorch when my driver supported CUDA 12.1:

$ pip install torch --index-url https://download.pytorch.org/whl/cu118
# Works, but slower than cu121 on RTX 4090

The fix: match your nvidia-smi CUDA version.

Mistake 2: Too Large Model

I tried 10M parameters on my first run:

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
GPU 0 has a total capacty of 24.00 GiB

The fix: start with 1M parameters, increase gradually.

Mistake 3: No API Key

$ python run.py --config experiment.md
ERROR: No API key configured
Set ANTHROPIC_API_KEY or OPENAI_API_KEY environment variable

The fix: set the environment variable before running.

Hardware Recommendations

From my testing:

GPU	VRAM	Max Parameters	Training Time
RTX 2080 Ti	11GB	500K	5 min
RTX 3080	10GB	1M	5 min
RTX 4090	24GB	5M	5 min

For basic experiments, 10GB VRAM is enough. For larger models, 24GB+ helps.

How Many Experiments Can You Run?

With 5-minute training per experiment:

8 hours = 96 experiments
24 hours = 288 experiments
Weekend (48 hours) = 576 experiments

That’s the “100 experiments overnight” claim. It’s achievable on consumer GPUs.

Summary

In this post, I showed how to set up autoresearch on your GPU. The key steps:

Check GPU visibility with nvidia-smi
Create a virtual environment
Install PyTorch with correct CUDA index URL (critical!)
Set your LLM API key
Write experiment goals in Markdown
Run and monitor

The trickiest part was getting PyTorch with CUDA support. Once that works, autoresearch handles the rest autonomously.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 autoresearch - GitHub Repository
👨‍💻 Reddit Discussion: What is autoresearch?

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!