How to Setup GPU Passthrough for AI Agent VMs: Native Performance with Full Isolation
The Problem
I wanted to run AI agents in a virtual machine for isolation, but the GPU performance was terrible. Inside the VM, my RTX 3080 that normally handles LLM inference at 80 tokens/second was crawling at 5 tokens/second. The virtual GPU drivers just couldn’t deliver.
┌─────────────────────────────────────────────────────────────┐│ ││ Bare Metal: RTX 3080 → 80 tokens/sec (100%) ││ ││ Standard VM: Virtual GPU → 5 tokens/sec (6%) ││ ││ The difference: 16x slower in VM ││ │└─────────────────────────────────────────────────────────────┘I needed GPU performance inside the VM. That’s when I discovered GPU passthrough via VFIO.
Why This Matters
Running AI agents in a VM makes sense for security:
- Isolated environment for untrusted code execution
- Separate credentials and “burner accounts” inside the VM
- Easy snapshot and rollback if something goes wrong
- Complete separation from your main system
But AI workloads need GPU acceleration. Without it, running local LLMs or GPU-accelerated agents becomes impractical.
GPU passthrough solves this by giving the VM direct access to your physical GPU. The Reddit community on r/ClaudeAI confirmed this approach:
"with the GPU passthrough it doesn't feel like a VM"
"the vm is my main pc now for anything involving ai"
"I use burner accounts for credentials inside the VM"Environment
- Host OS: Ubuntu 22.04 LTS
- GPU: NVIDIA RTX 3080
- CPU: AMD Ryzen 9 (Intel works too, settings differ)
- VM Software: KVM/QEMU with libvirt
- Guest OS: Ubuntu 22.04
How GPU Passthrough Works
VFIO (Virtual Function I/O) is a Linux kernel framework that lets userspace programs safely access hardware devices. For GPU passthrough:
┌─────────────────────────────────────────────────────────────┐│ Host System ││ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Host OS │ │ VFIO │ │ Guest │ ││ │ │ │ Driver │ │ VM │ ││ │ Uses iGPU │ │ │ │ │ ││ │ or 2nd GPU │ │ Mediates │ │ Sees GPU │ ││ └─────────────┘ │ Access │ │ as Native │ ││ └──────┬──────┘ └──────┬──────┘ ││ │ │ ││ └─────────┬─────────┘ ││ │ ││ ┌───────▼───────┐ ││ │ Physical GPU │ ││ │ (RTX 3080) │ ││ └───────────────┘ │└─────────────────────────────────────────────────────────────┘The VM sees the GPU as if it’s directly connected. No virtualization layer between the VM and the hardware.
Step 1: Verify Hardware Support
First, I checked if my CPU and motherboard support the necessary virtualization features:
# Check if CPU virtualization is enabledegrep -c '(vmx|svm)' /proc/cpuinfo# Output should be > 0
# Check for IOMMU support (AMD)dmesg | grep -e AMD-Vi# For Intel, check:dmesg | grep -e DMAR -e IOMMUIf the IOMMU check returns nothing, I needed to enable it in BIOS first:
- AMD: Enable “AMD-Vi” or “IOMMU” in BIOS
- Intel: Enable “VT-d” in BIOS
Step 2: Enable IOMMU in Kernel
I edited the GRUB configuration to enable IOMMU at boot:
# For AMD CPUsGRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt"
# For Intel CPUsGRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt"The iommu=pt option enables passthrough mode, which is more efficient for device passthrough.
Then updated GRUB and rebooted:
sudo update-grubsudo rebootAfter reboot, I verified IOMMU was active:
dmesg | grep -e IOMMU# Should show: AMD-Vi/Intel VT-d enabledStep 3: Identify GPU for Passthrough
I needed to find my GPU’s PCI address and ensure it was in its own IOMMU group:
lspci -nn | grep -i nvidia# Output example:# 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)# 01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)The GPU has two devices: the graphics card (01:00.0) and its audio controller (01:00.1). Both need to be passed through.
Then I checked IOMMU groups:
#!/bin/bashfor g in /sys/kernel/iommu_groups/*; do echo "IOMMU Group ${g##*/}:" for d in $g/devices/*; do echo -e "\t$(lspci -nn -s ${d##*/})" donedone | sort -n -k 3If the GPU shares an IOMMU group with other devices, ACS override patch may be needed. For my setup, the GPU was isolated in its own group.
Step 4: Bind GPU to VFIO Driver
The host must not use the GPU. I needed to bind it to the VFIO driver instead of the NVIDIA driver.
First, I found the device IDs:
lspci -nn | grep NVIDIA# Note the IDs in brackets, e.g., [10de:2206] and [10de:1aef]Then I created a VFIO configuration:
# Bind NVIDIA GPU and audio to VFIOoptions vfio-pci ids=10de:2206,10de:1aef disable_vga=1I also needed to ensure VFIO loads before NVIDIA drivers:
vfiovfio_iommu_type1vfio_virqfdvfio_pciThen rebuilt initramfs and rebooted:
sudo update-initramfs -usudo rebootAfter reboot, I verified the GPU was bound to VFIO:
lspci -nnk -d 10de:2206# Kernel driver in use: vfio-pci# (Not nvidia)Step 5: Keep a GPU for the Host
This is where I made a mistake the first time. I passed through my only GPU and couldn’t see anything on my monitor.
┌─────────────────────────────────────────────────────────────┐│ ││ WRONG: Pass through only GPU → Host has no display ││ ││ RIGHT: Keep one GPU for host, pass through second GPU ││ ││ Options: ││ - Use iGPU (Intel integrated graphics) for host ││ - Buy a cheap secondary GPU for host display ││ - Use headless host (SSH only, no local display) ││ │└─────────────────────────────────────────────────────────────┘My solution: I used the motherboard’s HDMI output connected to the iGPU for the host display, leaving the RTX 3080 exclusively for the VM.
Step 6: Create the VM with GPU Passthrough
I used virt-manager (GUI) or virsh (CLI) to create the VM. Here’s the libvirt XML configuration for GPU passthrough:
<devices> <!-- GPU passthrough --> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </hostdev>
<!-- GPU Audio passthrough --> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/> </hostdev></devices>The managed='yes' option lets libvirt automatically bind/unbind the device when starting/stopping the VM.
Key settings for performance:
<domain type='kvm'> <!-- Enable memory backing for hugepages --> <memoryBacking> <hugepages/> </memoryBacking>
<!-- CPU pinning for better performance --> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> <!-- ... more pinning --> </cputune>
<!-- Features for better performance --> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> </features></domain>Step 7: Install NVIDIA Drivers in the VM
Inside the guest VM, I installed the NVIDIA drivers normally:
# Add NVIDIA repositorysudo add-apt-repository ppa:graphics-drivers/ppasudo apt update
# Install driversudo apt install nvidia-driver-535
# Reboot VMsudo rebootAfter reboot, I verified the GPU was recognized:
nvidia-smi# Should show RTX 3080 with full VRAMPerformance Results
Now let me show you the performance comparison:
┌─────────────────────────────────────────────────────────────┐│ ││ Metric Before (virtio) After (VFIO) ││ ───────────────────────────────────────────────────── ││ LLM tokens/sec 5 78 ││ CUDA available No Yes ││ VRAM accessible None Full 10GB ││ PyTorch CUDA Fallback CPU Native GPU ││ ││ Performance gain: 15.6x faster ││ │└─────────────────────────────────────────────────────────────┘Inside the VM, it truly “doesn’t feel like a VM” when running AI workloads.
The Reason It Works
VFIO passthrough works because:
-
No Virtualization Layer: The VM kernel talks directly to the GPU hardware through VFIO, not through a virtual driver.
-
DMA Access: The GPU can do Direct Memory Access to VM memory, just like bare metal.
-
Interrupt Passthrough: GPU interrupts go directly to the VM, not through a hypervisor layer.
-
Full VRAM Access: The VM sees and uses all GPU memory natively.
The tradeoff: the host cannot use this GPU while the VM is running. The GPU is exclusive to one or the other.
Common Mistakes
I made these mistakes during setup:
Mistake 1: Not Enabling IOMMU in BIOS
dmesg | grep IOMMU# (empty output)
# Fix: Enable VT-d (Intel) or AMD-Vi in BIOSMistake 2: Passing Both GPUs to the VM
Host had no display after starting VMHad to SSH into host to stop VM
Fix: Keep iGPU or secondary GPU for hostMistake 3: Forgetting GPU Audio Device
The GPU has an audio device that must also be passed through, or some applications will fail:
lspci -nn | grep -i nvidia# 01:00.0 VGA [0300]: ... [10de:2206]# 01:00.1 Audio [0403]: ... [10de:1aef] ← Don't forget this!Mistake 4: Not Using Hugepages
Without hugepages, memory translation overhead can hurt GPU performance:
# Add to /etc/sysctl.confvm.nr_hugepages = 8192
# Applysudo sysctl -pQuick Reference Card
# 1. Enable IOMMU in BIOS (VT-d for Intel, AMD-Vi for AMD)
# 2. Enable IOMMU in kernel (add to /etc/default/grub)GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt"
# 3. Find GPU device IDslspci -nn | grep NVIDIA
# 4. Bind to VFIO (create /etc/modprobe.d/vfio.conf)options vfio-pci ids=10de:XXXX,10de:YYYY disable_vga=1
# 5. Update initramfssudo update-initramfs -u
# 6. Rebootsudo reboot
# 7. Verify bindinglspci -nnk -d 10de:XXXX# Should show: Kernel driver in use: vfio-pci
# 8. Configure VM with hostdev entries for GPU# 9. Install NVIDIA drivers inside VM# 10. Verify with nvidia-smiSummary
In this post, I showed how to configure GPU passthrough using VFIO/KVM to give AI agent VMs native GPU performance. The key points:
- Enable IOMMU in BIOS and kernel for hardware passthrough support
- Bind the GPU to VFIO driver instead of NVIDIA driver on host
- Keep a separate GPU (or iGPU) for host display
- Pass through both GPU and its audio device
- Configure VM with hostdev entries for the PCI devices
GPU passthrough gives you the best of both worlds: near-native GPU performance inside a fully isolated VM. You can run AI agents with full acceleration while keeping them completely separate from your main system.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit r/ClaudeAI: GPU Passthrough Discussion
- 👨💻 Linux VFIO Documentation
- 👨💻 KVM GPU Passthrough Guide
- 👨💻 NVIDIA VFIO Setup Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments