Skip to content

Is RTX 5090 Worth Upgrading from RTX 4090 for Local LLMs? Real User Experiences

I almost pulled the trigger on an RTX 5090 upgrade. Then I did the math.

gpu_comparison.txt
RTX 4090: 24GB VRAM, $1,600-1,800 (current card)
RTX 5090: 24GB VRAM, $2,000-2,500 (upgrade cost: $400-900)
Speed improvement: 15-25%
VRAM improvement: 0GB

That’s when I realized: for local LLMs, VRAM is the bottleneck. And both cards have the same amount.

Here’s my deep dive into whether the RTX 5090 is worth it for local LLM enthusiasts.

The VRAM Ceiling Problem

Running local LLMs is fundamentally about VRAM. Not CUDA cores, not clock speed—VRAM.

Why? Because the model weights need to live somewhere. Here’s the math:

Model SizeQ4 QuantizationMinimum VRAMComfortable VRAM
7B~5GB6GB8GB
13B~8GB10GB16GB
27B~16GB18GB24GB
30B~18GB20GB24GB
70B~40GB45GB48GB+
120B~70GB80GB96GB+

Both RTX 4090 and RTX 5090 have 24GB VRAM. This means:

  • Both can run models up to 27B-30B at Q4 comfortably
  • Neither can run 70B+ models without offloading or multi-GPU
  • The speed difference doesn’t matter if the model doesn’t fit

What Reddit Users Actually Reported

I dug through a recent Reddit thread with 63 upvotes about this exact question. Here’s what actual users reported:

User 1: Upgrading from 3090

reddit_testimonial_1.txt
"My 5090 did about 25% better for RL than my 3090, and actually
ran with less power for the amount of work being done."

Key insight: The 25% improvement was from 3090 to 5090, not 4090 to 5090. The gap between 4090 and 5090 is narrower.

User 2: Current 4090 Owner

reddit_testimonial_2.txt
"I have a 4090, and according to testing, the 5090 barely ranks
higher. The 4090 is just fine. And way cheaper."

Key insight: For LLM workloads specifically, the performance delta is minimal.

User 3: VRAM Constraints

reddit_testimonial_3.txt
"5090 simply won't work unless I limit it to run a single model"

Key insight: The VRAM ceiling is the same. If you’re hitting 24GB limits on 4090, you’ll hit them on 5090 too.

User 4: Power Efficiency

reddit_testimonial_4.txt
"5090 runs cooler and more efficient per computation unit"

Key insight: This is the real advantage. For 24/7 inference servers, efficiency compounds.

Benchmarking the Decision

I wrote a script to benchmark my current setup and estimate the upgrade value:

benchmark_gpu_value.py
import torch
import time
from dataclasses import dataclass
@dataclass
class GPUMetrics:
name: str
vram_gb: float
estimated_tokens_per_sec: float
upgrade_cost: float
def calculate_upgrade_value(current: GPUMetrics, upgrade: GPUMetrics,
daily_usage_hours: float = 8) -> dict:
"""
Calculate whether a GPU upgrade makes financial sense.
Returns analysis of speed gains vs. cost.
"""
speed_improvement = (upgrade.estimated_tokens_per_sec -
current.estimated_tokens_per_sec) / current.estimated_tokens_per_sec
# VRAM comparison
vram_improvement = upgrade.vram_gb - current.vram_gb
# Cost per percentage speed gain
cost_per_percent = upgrade.upgrade_cost / (speed_improvement * 100) if speed_improvement > 0 else float('inf')
return {
"speed_improvement_pct": speed_improvement * 100,
"vram_improvement_gb": vram_improvement,
"upgrade_cost": upgrade.upgrade_cost,
"cost_per_percent_speed": cost_per_percent,
"worth_it": vram_improvement > 0 or speed_improvement > 30
}
# My analysis
rtx_4090 = GPUMetrics("RTX 4090", 24.0, 45.0, 0) # Current card
rtx_5090 = GPUMetrics("RTX 5090", 24.0, 55.0, 500) # Net upgrade cost
result = calculate_upgrade_value(rtx_4090, rtx_5090)
print(f"Speed improvement: {result['speed_improvement_pct']:.1f}%")
print(f"VRAM improvement: {result['vram_improvement_gb']:.1f}GB")
print(f"Cost: ${result['upgrade_cost']}")
print(f"Worth it? {result['worth_it']}")

Output:

benchmark_output.txt
Speed improvement: 22.2%
VRAM improvement: 0.0GB
Cost: $500
Worth it? False

For me, paying $500 for a 22% speed improvement with zero VRAM gain doesn’t make sense.

When Does RTX 5090 Actually Make Sense?

I analyzed different scenarios to see who should upgrade:

upgrade_decision_matrix.yaml
# SCENARIO 1: Already own RTX 4090
owns_4090:
recommendation: "SKIP"
reasoning:
- "Same 24GB VRAM ceiling"
- "Only 15-25% speed gain"
- "Upgrade cost doesn't justify marginal improvement"
verdict: "Not worth it for LLM workloads"
# SCENARIO 2: Own RTX 3090 or older
owns_3090_or_older:
recommendation: "CONSIDER"
reasoning:
- "25%+ performance gain from 3090"
- "Better power efficiency"
- "Newer architecture benefits"
verdict: "Worth evaluating, especially if selling old card"
# SCENARIO 3: Building new system
new_build:
recommendation: "CONSIDER 5090"
reasoning:
- "Better longevity"
- "Higher resale value"
- "More efficient at load"
verdict: "5090 preferred for new builds"
# SCENARIO 4: 24/7 inference server
inference_server:
recommendation: "YES"
reasoning:
- "Power efficiency compounds over time"
- "Lower heat output"
- "Better for continuous workloads"
verdict: "Efficiency gains justify upgrade"
# SCENARIO 5: Multi-model serving
multi_model:
recommendation: "MAYBE"
reasoning:
- "Better memory bandwidth"
- "Handles concurrent requests better"
verdict: "Depends on workload specifics"

The Power Efficiency Argument

Here’s where the 5090 genuinely wins. I calculated the electricity cost difference for a 24/7 inference setup:

power_cost_analysis.py
def calculate_annual_power_cost(watts: float, hours_per_day: float,
cost_per_kwh: float = 0.12) -> float:
"""Calculate annual electricity cost for a GPU."""
kwh_per_day = (watts * hours_per_day) / 1000
kwh_per_year = kwh_per_day * 365
return kwh_per_year * cost_per_kwh
# Assuming 50% load during inference
rtx_4090_tdp = 450 # watts
rtx_5090_tdp = 575 # watts (but more efficient per computation)
# Effective power for same workload (5090 finishes faster)
rtx_4090_effective = 400 # actual draw under LLM load
rtx_5090_effective = 350 # more efficient despite higher TDP
# 24/7 inference server
cost_4090 = calculate_annual_power_cost(rtx_4090_effective, 24)
cost_5090 = calculate_annual_power_cost(rtx_5090_effective, 24)
print(f"RTX 4090 annual power cost: ${cost_4090:.2f}")
print(f"RTX 5090 annual power cost: ${cost_5090:.2f}")
print(f"Annual savings with 5090: ${cost_4090 - cost_5090:.2f}")

Output:

power_analysis_output.txt
RTX 4090 annual power cost: $420.48
RTX 5090 annual power cost: $367.92
Annual savings with 5090: $52.56

For a 24/7 server, you save about $50/year in electricity. Over 5 years, that’s $250—half the upgrade cost recovered through efficiency.

Checking Your Current VRAM Usage

Before deciding on an upgrade, check what you actually need:

check_vram_needs.py
import torch
def analyze_vram_usage():
"""Detailed VRAM analysis for your current setup."""
if not torch.cuda.is_available():
print("No CUDA GPU available")
return
for i in range(torch.cuda.device_count()):
props = torch.cuda.get_device_properties(i)
total_vram = props.total_memory / (1024**3)
allocated = torch.cuda.memory_allocated(i) / (1024**3)
cached = torch.cuda.memory_reserved(i) / (1024**3)
available = total_vram - cached
print(f"GPU {i}: {props.name}")
print(f" Total VRAM: {total_vram:.1f} GB")
print(f" Currently used: {allocated:.1f} GB")
print(f" Cached: {cached:.1f} GB")
print(f" Available: {available:.1f} GB")
# Calculate max model size that fits
max_4bit_params = (available * 0.8) / 0.5 # 4-bit = 0.5 bytes/param, 80% utilization
print(f" Max model (4-bit): ~{max_4bit_params:.0f}B params")
analyze_vram_usage()

This tells you exactly what model sizes you can run. If you’re consistently at 90%+ utilization, you might need more VRAM—but the 5090 won’t help with that.

Alternative Solutions for VRAM Limits

If you’re hitting VRAM walls on your 4090, here are better options than upgrading to another 24GB card:

Option 1: Dual RTX 3090 (48GB total)

dual_3090_config.txt
Cost: $1,200-1,600 (used cards)
VRAM: 48GB pooled via NVLink
Pros:
- Can run 70B Q4 models
- NVLink for efficient memory pooling
- Proven multi-GPU support
Cons:
- Higher power draw
- More complex setup
- Used cards have warranty risk

Option 2: Mac Studio with Unified Memory

mac_studio_config.txt
Cost: $3,500-5,000
VRAM: 128GB unified memory
Pros:
- Can run 70B+ models
- Large context windows
- No VRAM fragmentation
Cons:
- Slower inference than discrete GPU
- Higher upfront cost
- Not upgradeable

Option 3: Wait for 32GB+ Consumer Cards

future_options.txt
RTX 5090 Ti (rumored): 32GB VRAM
RTX 6090 (future): Likely 32GB+ VRAM
Professional cards: 48GB+ available now at $5,000+

Common Mistakes to Avoid

Mistake 1: Confusing Speed with Capacity

A faster GPU with the same VRAM doesn’t let you run larger models. It just runs the same models faster.

I see this all the time in forums: “I bought a 5090 to run 70B models.” That doesn’t work. You still need 40GB+ VRAM for Q4 70B.

Mistake 2: Ignoring Power Costs

If you run inference 24/7, efficiency matters. But for occasional use, the power savings don’t justify the upgrade cost.

power_break_even.txt
Upgrade cost: $500
Annual power savings: $50
Break-even time: 10 years
Conclusion: Power savings alone don't justify upgrade

Mistake 3: Forgetting About Multi-GPU

Two used RTX 3090s ($1,200-1,600) give you 48GB VRAM. A single RTX 5090 ($2,000+) gives you 24GB VRAM.

For LLM workloads specifically, multi-GPU with NVLink often beats a single faster card.

Mistake 4: Not Checking Actual Specs

Some RTX 5090 variants have different VRAM configurations. Always verify:

  • Standard RTX 5090: 24GB GDDR7
  • Some OEM variants: Different configurations
  • Professional variants: 32GB+ available at higher cost

The Decision Framework

I created this decision tree to help evaluate the upgrade:

upgrade_decision_tree.txt
START: Do you own an RTX 4090?
|
+--[YES]--> Are you hitting VRAM limits?
| |
| +--[YES]--> 5090 won't help. Consider multi-GPU or Mac Studio.
| |
| +--[NO]--> Is speed a bottleneck?
| |
| +--[YES]--> Is 20% faster worth $500?
| | |
| | +--[YES]--> Upgrade to 5090
| | |
| | +--[NO]--> Keep 4090
| |
| +--[NO]--> Keep 4090
|
+--[NO]--> Do you own RTX 3090 or older?
|
+--[YES]--> Consider 5090 for 25%+ speed gain + efficiency
|
+--[NO]--> Building new?
|
+--[YES]--> 5090 for longevity
|
+--[NO]--> Re-evaluate your needs

What I Decided

After all this analysis, I’m keeping my RTX 4090. Here’s why:

  1. VRAM is my bottleneck, not speed - I want to run larger models, not run the same models faster.

  2. The upgrade cost doesn’t justify the gain - $500 for 20% speed improvement with zero VRAM gain is poor value.

  3. My next upgrade will be VRAM-focused - I’m saving for either dual 3090s (48GB) or a Mac Studio (128GB unified).

  4. The 4090 is still excellent - It handles everything I need, just not always as fast as a 5090 would.

Final Recommendations

Your SituationRecommendation
Own RTX 4090Skip - Not worth the marginal upgrade
Own RTX 3090 or olderConsider - Meaningful speed and efficiency gains
Building new systemBuy 5090 - Better longevity and efficiency
Running 24/7 inferenceUpgrade - Efficiency savings compound
VRAM-constrainedSkip 5090 - Look at multi-GPU or Mac alternatives

The RTX 5090 is an excellent GPU. But for local LLM workloads specifically, VRAM capacity matters more than inference speed. If you already have a 4090, you’re better off waiting for a card with more VRAM—or investing in multi-GPU setups that actually expand your model options.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments