How to Set Up and Run autoresearch on Your GPU?
Problem
I wanted to try Andrej Karpathy’s autoresearch framework on my RTX 4090. I cloned the repo, ran the install script, and got this:
$ python run.py --config experiment.mdERROR: torch not found with CUDA supportERROR: CUDA runtime not detectedPlease install PyTorch with CUDA support first.My GPU was sitting idle while autoresearch couldn’t use it. Here’s how I fixed it.
Environment
- GPU: NVIDIA RTX 4090 (24GB VRAM)
- OS: Ubuntu 22.04
- Python: 3.10
- CUDA: 12.1
- PyTorch: 2.1.0+cu121
Step 1: Check Your GPU First
Before installing anything, verify your GPU is visible:
$ nvidia-smi+-----------------------------------------------------------------------------+| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off || 30% 45C P8 15W / 450W | 0MiB / 24564MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+If you see this, your GPU is ready. If you get “command not found,” install NVIDIA drivers first.
Step 2: Clone the Repository
$ git clone https://github.com/karpathy/autoresearch.git$ cd autoresearchSimple enough. But the tricky part comes next.
Step 3: Create Virtual Environment
I tried installing dependencies globally first. Bad idea:
$ pip install -r requirements.txtERROR: Cannot install torch with CUDA in global environmentHINT: Use a virtual environmentCreate a clean venv:
$ python -m venv venv$ source venv/bin/activateStep 4: Install PyTorch with CUDA (The Critical Step)
This is where I got stuck for 2 hours. I kept installing CPU-only PyTorch:
$ pip install torch torchvision torchaudio# This installs CPU version by default!Then when I ran autoresearch:
$ python run.py --config experiment.mdWARNING: CUDA not available, falling back to CPUTraining will be 50x slower...The fix: install PyTorch with the correct CUDA index URL:
# For CUDA 12.1 (check your nvidia-smi output)$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118After installation, verify CUDA works:
$ python -c "import torch; print(torch.cuda.is_available())"True
$ python -c "import torch; print(torch.cuda.get_device_name(0))"NVIDIA GeForce RTX 4090If you see False, you installed the wrong PyTorch version. Reinstall with the correct CUDA index URL.
Step 5: Install Remaining Dependencies
$ pip install -r requirements.txtThe requirements.txt in autoresearch includes:
- anthropic (for Claude API)
- openai (for GPT-4 API)
- matplotlib (for visualizations)
- numpy, tqdm, etc.
Step 6: Configure API Keys
autoresearch needs an LLM to generate experiments. Set your API key:
# Option 1: Claude (recommended)$ export ANTHROPIC_API_KEY="sk-ant-your-key-here"
# Option 2: OpenAI$ export OPENAI_API_KEY="sk-your-key-here"Or add to your .bashrc for persistence:
$ echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.bashrc$ source ~/.bashrcStep 7: Create Your First Experiment
autoresearch uses Markdown files to define experiments. Create one:
# Experiment: Character-Level Language Model
## GoalMinimize validation loss for a character-level language model on Shakespeare dataset.
## Constraints- Training time: 5 minutes per experiment- Max parameters: 1 million- Single GPU: RTX 4090
## Starting Architecture- Embedding dimension: 64- Number of attention heads: 4- Transformer layers: 4- Learning rate: 3e-4
## Success Metrics- Validation perplexity < 1.5- Training stability (no NaN losses)The AI reads this and generates Python code automatically.
Step 8: Run autoresearch
$ python run.py --config experiment.md --timeout 8hWhat happens:
Initializing autoresearch...GPU detected: NVIDIA GeForce RTX 4090CUDA version: 12.1VRAM available: 24GB
Experiment 1: Baseline model Generated model.py (234 lines) Training for 5 minutes... Validation perplexity: 2.3 Decision: Keep as baseline
Experiment 2: Increase embedding dim to 128 Modified model architecture Training for 5 minutes... Validation perplexity: 1.8 Decision: Keep (improvement detected)
Experiment 3: Add dropout 0.2 Training for 5 minutes... Validation perplexity: 1.9 Decision: Discard (no improvement)
...continuing autonomously...The AI runs experiments, evaluates results, and decides whether to keep changes. It loops until you stop it or it reaches your goal.
Step 9: Monitor Progress
While autoresearch runs, I monitor:
$ watch -n 1 nvidia-smiShows real-time GPU utilization. During training, expect:
- GPU-Util: 95-100%
- Memory-Usage: 2-8GB (for small models)
- Power: 200-300W
Check experiment logs:
$ tail -f logs/experiment.logUnderstanding the Output Structure
autoresearch creates a clear directory structure:
autoresearch/├── experiments/│ ├── exp_001/│ │ ├── model.py # Generated model code│ │ ├── train.py # Training script│ │ └── results.json # Metrics│ ├── exp_002/│ └── ...├── logs/│ └── experiment.log├── checkpoints/│ └── best_model.pt # Best model weights└── results/ ├── loss_curves.png # Visualization └── summary.md # AI-generated analysisEach experiment is versioned. You can reproduce any experiment by looking at its code.
What Went Wrong for Me
Mistake 1: Wrong CUDA Version
I installed CUDA 11.8 PyTorch when my driver supported CUDA 12.1:
$ pip install torch --index-url https://download.pytorch.org/whl/cu118# Works, but slower than cu121 on RTX 4090The fix: match your nvidia-smi CUDA version.
Mistake 2: Too Large Model
I tried 10M parameters on my first run:
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiBGPU 0 has a total capacty of 24.00 GiBThe fix: start with 1M parameters, increase gradually.
Mistake 3: No API Key
$ python run.py --config experiment.mdERROR: No API key configuredSet ANTHROPIC_API_KEY or OPENAI_API_KEY environment variableThe fix: set the environment variable before running.
Hardware Recommendations
From my testing:
| GPU | VRAM | Max Parameters | Training Time |
|---|---|---|---|
| RTX 2080 Ti | 11GB | 500K | 5 min |
| RTX 3080 | 10GB | 1M | 5 min |
| RTX 4090 | 24GB | 5M | 5 min |
For basic experiments, 10GB VRAM is enough. For larger models, 24GB+ helps.
How Many Experiments Can You Run?
With 5-minute training per experiment:
8 hours = 96 experiments24 hours = 288 experimentsWeekend (48 hours) = 576 experimentsThat’s the “100 experiments overnight” claim. It’s achievable on consumer GPUs.
Summary
In this post, I showed how to set up autoresearch on your GPU. The key steps:
- Check GPU visibility with
nvidia-smi - Create a virtual environment
- Install PyTorch with correct CUDA index URL (critical!)
- Set your LLM API key
- Write experiment goals in Markdown
- Run and monitor
The trickiest part was getting PyTorch with CUDA support. Once that works, autoresearch handles the rest autonomously.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments