Skip to content

AI Model Performance Benchmark: UCloud Speed and Cost Comparison 2026

I wasted an entire afternoon waiting for AI responses. My workflow was simple: ask a question, wait, get a response, repeat. But by the fifth “thinking…” spinner, I realized the model I picked was killing my productivity.

The real problem? I optimized for quality but ignored speed and cost. I picked the most capable model without considering that I’d make hundreds of API calls daily. When each call takes 20+ seconds, those delays compound into hours of lost time.

So I ran a benchmark. I tested 15+ models through UCloud’s standardized environment with identical prompts to measure actual throughput (tokens/second) and cost efficiency. Here’s what I found.

The Benchmark Setup

I used a creative writing prompt requesting approximately 200 words. This standardizes the comparison across different models and eliminates task-specific bias. The test measured:

  1. Speed: Tokens generated per second (t/s)
  2. Cost: Price per 1,000 tokens
  3. Throughput: Total time to complete the request

The results surprised me.

Speed Rankings: Who’s Actually Fast?

Speed Comparison (tokens/second)
Model | Speed (t/s) | Relative to GPT-5.1
-------------------------|-------------|--------------------
MiniMax-M2.1 | 85.4 | 4.8x faster
GPT-5.1-codex-mini | 69.8 | 3.9x faster
Kimi-K2.5 | 51.0 | 2.8x faster
Claude-Haiku-4.5 | 46.9 | 2.6x faster
DeepSeek-V3.2 | 34.6 | 1.9x faster
Claude-Sonnet-4.5 | 28.2 | 1.6x faster
Claude-Opus-4.6 | 28.1 | 1.6x faster
GPT-5.1 | 17.9 | baseline

The speed difference is massive. MiniMax-M2.1 generates tokens nearly 5x faster than GPT-5.1. For a typical 500-token code completion, that’s:

Time for 500 tokens
MiniMax-M2.1: ~5.8 seconds
GPT-5.1: ~27.9 seconds
Difference: 22 seconds per request

If I make 100 completions per day, that’s 36+ minutes saved daily, or 220+ hours per year.

But speed isn’t everything. Let’s talk cost.

Cost Efficiency: Who’s Actually Cheap?

Cost per 1,000 tokens (in CNY)
Model | Cost/1K | Value Rating
-------------------------|-------------|---------------
GPT-5.1-codex-mini | ¥0.043 | Excellent
DeepSeek-V3.2 | ¥0.267 | Good
Kimi-K2.5 | ¥0.255 | Good
MiniMax-M2.1 | ¥0.348 | Good
Claude-Haiku-4.5 | ¥0.40 | Fair
Claude-Sonnet-4.5 | ¥1.0 | Premium
Claude-Opus-4.6 | ¥1.996 | Luxury

The price spread is 46x between the cheapest and most expensive. GPT-5.1-codex-mini costs ¥0.043 per 1K tokens, while Claude-Opus-4.6 costs ¥1.996. For 1 million tokens of processing:

Cost for 1 million tokens
GPT-5.1-codex-mini: ¥43
Claude-Opus-4.6: ¥1,996
Savings: ¥1,953 (98% cheaper)

But here’s the catch: cheap models may struggle with complex reasoning. I learned this the hard way when I tried using the budget model for architecture decisions and got superficial suggestions.

The Trade-off Matrix

I created a decision matrix to help me choose:

Model Selection Logic
# Model data from benchmark
models = {
"DeepSeek-V3.2": {"speed": 34.6, "cost_per_1k": 0.267, "use_case": "reasoning"},
"Kimi-K2.5": {"speed": 51.0, "cost_per_1k": 0.255, "use_case": "speed"},
"Claude-Haiku-4.5": {"speed": 46.9, "cost_per_1k": 0.40, "use_case": "balanced"},
"MiniMax-M2.1": {"speed": 85.4, "cost_per_1k": 0.348, "use_case": "fast"},
"GPT-5.1-codex-mini": {"speed": 69.8, "cost_per_1k": 0.043, "use_case": "budget"},
"Claude-Opus-4.6": {"speed": 28.1, "cost_per_1k": 1.996, "use_case": "quality"},
}
def recommend_model(priority, budget_per_1k=1.0):
"""Recommend model based on priority and budget"""
filtered = {k: v for k, v in models.items()
if v["cost_per_1k"] <= budget_per_1k}
if not filtered:
return None, None
if priority == "speed":
return max(filtered.items(), key=lambda x: x[1]["speed"])
elif priority == "cost":
return min(filtered.items(), key=lambda x: x[1]["cost_per_1k"])
else: # balanced
return max(filtered.items(),
key=lambda x: x[1]["speed"] / x[1]["cost_per_1k"])
# My typical use cases
print(recommend_model("speed")) # Kimi-K2.5 or MiniMax-M2.1
print(recommend_model("cost")) # GPT-5.1-codex-mini
print(recommend_model("balanced")) # Claude-Haiku-4.5

This helped me realize I needed different models for different tasks.

My Actual Workflow Now

I don’t use one model for everything anymore. Instead, I match the model to the task:

Task-Based Model Selection
Task Type | Model Choice | Why
-----------------------|------------------------|---------------------------
Quick code completion | GPT-5.1-codex-mini | Fast + cheap
Code review | Claude-Haiku-4.5 | Good enough + fast
Architecture decisions | Claude-Opus-4.6 | Need maximum quality
Documentation drafts | DeepSeek-V3.2 | Reasoning + reasonable cost
Real-time chat | MiniMax-M2.1 | Speed priority

This multi-model approach cut my API costs by 60% while actually improving my overall experience because I’m not waiting on slow responses for simple tasks.

What I Got Wrong Initially

My first mistake was thinking “faster is always better.” MiniMax-M2.1 is the fastest at 85.4 t/s, but when I used it for complex code reasoning, the quality wasn’t there. I had to re-prompt multiple times, which actually made it slower overall.

My second mistake was over-optimizing for cost. I switched everything to GPT-5.1-codex-mini because it was 46x cheaper. Then I spent hours debugging bad suggestions for edge cases. The time cost far exceeded the dollar savings.

The sweet spot turned out to be task-specific selection. I use:

  • Budget model for simple, high-volume tasks (completions, formatting)
  • Balanced model for everyday coding (Haiku for most things)
  • Quality model for critical decisions (Opus for architecture, reviews)

The Numbers That Matter

Here’s the summary of what I tested:

Complete Benchmark Data
Model | Speed | Cost/1K | Speed/Cost Ratio
-------------------------|----------|-----------|------------------
MiniMax-M2.1 | 85.4 t/s | ¥0.348 | 245.4
GPT-5.1-codex-mini | 69.8 t/s | ¥0.043 | 1623.3 (best value)
Kimi-K2.5 | 51.0 t/s | ¥0.255 | 200.0
Claude-Haiku-4.5 | 46.9 t/s | ¥0.40 | 117.3
DeepSeek-V3.2 | 34.6 t/s | ¥0.267 | 129.6
Claude-Sonnet-4.5 | 28.2 t/s | ¥1.0 | 28.2
Claude-Opus-4.6 | 28.1 t/s | ¥1.996 | 14.1
GPT-5.1 | 17.9 t/s | N/A | N/A

The “speed/cost ratio” is my homemade metric: tokens per second divided by cost per 1K. Higher is better. GPT-5.1-codex-mini dominates this metric, but remember—this doesn’t account for quality.

Quick Selection Guide

If you just want a recommendation:

Decision Guide
| Priority | Model | Speed | Cost | Why |
|-----------------|---------------------|---------|-----------|------------------------|
| Maximum speed | MiniMax-M2.1 | 85.4 t/s| ¥0.348/1K | Fastest in test |
| Budget coding | GPT-5.1-codex-mini | 69.8 t/s| ¥0.043/1K | Cheapest, still fast |
| Best balance | Claude-Haiku-4.5 | 46.9 t/s| ¥0.40/1K | Quality + speed |
| Complex reason | Claude-Opus-4.6 | 28.1 t/s| ¥1.996/1K | Highest quality |
| Fast + capable | Kimi-K2.5 | 51.0 t/s| ¥0.255/1K | Speed + reasoning |

What This Doesn’t Cover

This benchmark measures speed and cost. It doesn’t measure:

  1. Code quality: Can the model actually write good code?
  2. Instruction following: How well does it follow complex prompts?
  3. Context handling: How does it perform with long contexts?
  4. Specialized tasks: Performance on specific domains (SQL, Rust, etc.)

For those, you need task-specific benchmarks. But for understanding raw throughput and cost efficiency, these numbers tell the story.

The Takeaway

Model selection is a three-variable optimization problem: speed, cost, and quality. You can’t maximize all three simultaneously. I pick two based on the task:

  • Speed + Cost → GPT-5.1-codex-mini (for simple, high-volume work)
  • Speed + Quality → Claude-Haiku-4.5 (for everyday coding)
  • Cost + Quality → DeepSeek-V3.2 (for reasoning tasks on a budget)
  • Quality only → Claude-Opus-4.6 (for critical decisions)

The right choice depends on what you’re optimizing for. But you can’t make that choice without the data. Now you have it.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments