How to Setup GPU Passthrough for AI Agent VMs: Native Performance with Full Isolation

Apr 1, 2026

The Problem

I wanted to run AI agents in a virtual machine for isolation, but the GPU performance was terrible. Inside the VM, my RTX 3080 that normally handles LLM inference at 80 tokens/second was crawling at 5 tokens/second. The virtual GPU drivers just couldn’t deliver.

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  Bare Metal:  RTX 3080 → 80 tokens/sec (100%)              │
│                                                             │
│  Standard VM: Virtual GPU → 5 tokens/sec (6%)             │
│                                                             │
│  The difference: 16x slower in VM                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

I needed GPU performance inside the VM. That’s when I discovered GPU passthrough via VFIO.

Why This Matters

Running AI agents in a VM makes sense for security:

Isolated environment for untrusted code execution
Separate credentials and “burner accounts” inside the VM
Easy snapshot and rollback if something goes wrong
Complete separation from your main system

But AI workloads need GPU acceleration. Without it, running local LLMs or GPU-accelerated agents becomes impractical.

GPU passthrough solves this by giving the VM direct access to your physical GPU. The Reddit community on r/ClaudeAI confirmed this approach:

"with the GPU passthrough it doesn't feel like a VM"

"the vm is my main pc now for anything involving ai"

"I use burner accounts for credentials inside the VM"

Environment

Host OS: Ubuntu 22.04 LTS
GPU: NVIDIA RTX 3080
CPU: AMD Ryzen 9 (Intel works too, settings differ)
VM Software: KVM/QEMU with libvirt
Guest OS: Ubuntu 22.04

How GPU Passthrough Works

VFIO (Virtual Function I/O) is a Linux kernel framework that lets userspace programs safely access hardware devices. For GPU passthrough:

┌─────────────────────────────────────────────────────────────┐
│                        Host System                          │
│                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │   Host OS   │    │   VFIO      │    │   Guest    │     │
│  │             │    │   Driver    │    │   VM       │     │
│  │  Uses iGPU  │    │             │    │             │     │
│  │  or 2nd GPU │    │  Mediates   │    │  Sees GPU  │     │
│  └─────────────┘    │  Access     │    │  as Native  │     │
│                     └──────┬──────┘    └──────┬──────┘     │
│                            │                   │            │
│                            └─────────┬─────────┘            │
│                                      │                      │
│                              ┌───────▼───────┐              │
│                              │  Physical GPU │              │
│                              │  (RTX 3080)   │              │
│                              └───────────────┘              │
└─────────────────────────────────────────────────────────────┘

The VM sees the GPU as if it’s directly connected. No virtualization layer between the VM and the hardware.

Step 1: Verify Hardware Support

First, I checked if my CPU and motherboard support the necessary virtualization features:

# Check if CPU virtualization is enabled
egrep -c '(vmx|svm)' /proc/cpuinfo
# Output should be > 0

# Check for IOMMU support (AMD)
dmesg | grep -e AMD-Vi
# For Intel, check:
dmesg | grep -e DMAR -e IOMMU

If the IOMMU check returns nothing, I needed to enable it in BIOS first:

AMD: Enable “AMD-Vi” or “IOMMU” in BIOS
Intel: Enable “VT-d” in BIOS

Step 2: Enable IOMMU in Kernel

I edited the GRUB configuration to enable IOMMU at boot:

# For AMD CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt"

# For Intel CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt"

The iommu=pt option enables passthrough mode, which is more efficient for device passthrough.

Then updated GRUB and rebooted:

sudo update-grub
sudo reboot

After reboot, I verified IOMMU was active:

dmesg | grep -e IOMMU
# Should show: AMD-Vi/Intel VT-d enabled

Step 3: Identify GPU for Passthrough

I needed to find my GPU’s PCI address and ensure it was in its own IOMMU group:

lspci -nn | grep -i nvidia
# Output example:
# 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)
# 01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)

The GPU has two devices: the graphics card (01:00.0) and its audio controller (01:00.1). Both need to be passed through.

Then I checked IOMMU groups:

#!/bin/bash
for g in /sys/kernel/iommu_groups/*; do
  echo "IOMMU Group ${g##*/}:"
  for d in $g/devices/*; do
    echo -e "\t$(lspci -nn -s ${d##*/})"
  done
done | sort -n -k 3

If the GPU shares an IOMMU group with other devices, ACS override patch may be needed. For my setup, the GPU was isolated in its own group.

Step 4: Bind GPU to VFIO Driver

The host must not use the GPU. I needed to bind it to the VFIO driver instead of the NVIDIA driver.

First, I found the device IDs:

lspci -nn | grep NVIDIA
# Note the IDs in brackets, e.g., [10de:2206] and [10de:1aef]

Then I created a VFIO configuration:

# Bind NVIDIA GPU and audio to VFIO
options vfio-pci ids=10de:2206,10de:1aef disable_vga=1

I also needed to ensure VFIO loads before NVIDIA drivers:

vfio
vfio_iommu_type1
vfio_virqfd
vfio_pci

Then rebuilt initramfs and rebooted:

sudo update-initramfs -u
sudo reboot

After reboot, I verified the GPU was bound to VFIO:

lspci -nnk -d 10de:2206
# Kernel driver in use: vfio-pci
# (Not nvidia)

Step 5: Keep a GPU for the Host

This is where I made a mistake the first time. I passed through my only GPU and couldn’t see anything on my monitor.

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  WRONG: Pass through only GPU → Host has no display        │
│                                                             │
│  RIGHT: Keep one GPU for host, pass through second GPU     │
│                                                             │
│  Options:                                                   │
│  - Use iGPU (Intel integrated graphics) for host          │
│  - Buy a cheap secondary GPU for host display              │
│  - Use headless host (SSH only, no local display)          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

My solution: I used the motherboard’s HDMI output connected to the iGPU for the host display, leaving the RTX 3080 exclusively for the VM.

Step 6: Create the VM with GPU Passthrough

I used virt-manager (GUI) or virsh (CLI) to create the VM. Here’s the libvirt XML configuration for GPU passthrough:

<devices>
  <!-- GPU passthrough -->
  <hostdev mode='subsystem' type='pci' managed='yes'>
    <driver name='vfio'/>
    <source>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </source>
    <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
  </hostdev>

  <!-- GPU Audio passthrough -->
  <hostdev mode='subsystem' type='pci' managed='yes'>
    <driver name='vfio'/>
    <source>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
    </source>
    <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
  </hostdev>
</devices>

The managed='yes' option lets libvirt automatically bind/unbind the device when starting/stopping the VM.

Key settings for performance:

<domain type='kvm'>
  <!-- Enable memory backing for hugepages -->
  <memoryBacking>
    <hugepages/>
  </memoryBacking>

  <!-- CPU pinning for better performance -->
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <!-- ... more pinning -->
  </cputune>

  <!-- Features for better performance -->
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
  </features>
</domain>

Step 7: Install NVIDIA Drivers in the VM

Inside the guest VM, I installed the NVIDIA drivers normally:

# Add NVIDIA repository
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# Install driver
sudo apt install nvidia-driver-535

# Reboot VM
sudo reboot

After reboot, I verified the GPU was recognized:

nvidia-smi
# Should show RTX 3080 with full VRAM

Performance Results

Now let me show you the performance comparison:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  Metric              Before (virtio)    After (VFIO)        │
│  ─────────────────────────────────────────────────────      │
│  LLM tokens/sec      5                  78                  │
│  CUDA available      No                 Yes                 │
│  VRAM accessible     None               Full 10GB           │
│  PyTorch CUDA        Fallback CPU       Native GPU          │
│                                                             │
│  Performance gain: 15.6x faster                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Inside the VM, it truly “doesn’t feel like a VM” when running AI workloads.

The Reason It Works

VFIO passthrough works because:

No Virtualization Layer: The VM kernel talks directly to the GPU hardware through VFIO, not through a virtual driver.
DMA Access: The GPU can do Direct Memory Access to VM memory, just like bare metal.
Interrupt Passthrough: GPU interrupts go directly to the VM, not through a hypervisor layer.
Full VRAM Access: The VM sees and uses all GPU memory natively.

The tradeoff: the host cannot use this GPU while the VM is running. The GPU is exclusive to one or the other.

Common Mistakes

I made these mistakes during setup:

Mistake 1: Not Enabling IOMMU in BIOS

dmesg | grep IOMMU
# (empty output)

# Fix: Enable VT-d (Intel) or AMD-Vi in BIOS

Mistake 2: Passing Both GPUs to the VM

Host had no display after starting VM
Had to SSH into host to stop VM

Fix: Keep iGPU or secondary GPU for host

Mistake 3: Forgetting GPU Audio Device

The GPU has an audio device that must also be passed through, or some applications will fail:

lspci -nn | grep -i nvidia
# 01:00.0 VGA [0300]: ... [10de:2206]
# 01:00.1 Audio [0403]: ... [10de:1aef]  ← Don't forget this!

Mistake 4: Not Using Hugepages

Without hugepages, memory translation overhead can hurt GPU performance:

# Add to /etc/sysctl.conf
vm.nr_hugepages = 8192

# Apply
sudo sysctl -p

Quick Reference Card

# 1. Enable IOMMU in BIOS (VT-d for Intel, AMD-Vi for AMD)

# 2. Enable IOMMU in kernel (add to /etc/default/grub)
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt"

# 3. Find GPU device IDs
lspci -nn | grep NVIDIA

# 4. Bind to VFIO (create /etc/modprobe.d/vfio.conf)
options vfio-pci ids=10de:XXXX,10de:YYYY disable_vga=1

# 5. Update initramfs
sudo update-initramfs -u

# 6. Reboot
sudo reboot

# 7. Verify binding
lspci -nnk -d 10de:XXXX
# Should show: Kernel driver in use: vfio-pci

# 8. Configure VM with hostdev entries for GPU
# 9. Install NVIDIA drivers inside VM
# 10. Verify with nvidia-smi

Summary

In this post, I showed how to configure GPU passthrough using VFIO/KVM to give AI agent VMs native GPU performance. The key points:

Enable IOMMU in BIOS and kernel for hardware passthrough support
Bind the GPU to VFIO driver instead of NVIDIA driver on host
Keep a separate GPU (or iGPU) for host display
Pass through both GPU and its audio device
Configure VM with hostdev entries for the PCI devices

GPU passthrough gives you the best of both worlds: near-native GPU performance inside a fully isolated VM. You can run AI agents with full acceleration while keeping them completely separate from your main system.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit r/ClaudeAI: GPU Passthrough Discussion
👨‍💻 Linux VFIO Documentation
👨‍💻 KVM GPU Passthrough Guide
👨‍💻 NVIDIA VFIO Setup Guide

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!