AI Model Performance Benchmark: UCloud Speed and Cost Comparison 2026
I wasted an entire afternoon waiting for AI responses. My workflow was simple: ask a question, wait, get a response, repeat. But by the fifth “thinking…” spinner, I realized the model I picked was killing my productivity.
The real problem? I optimized for quality but ignored speed and cost. I picked the most capable model without considering that I’d make hundreds of API calls daily. When each call takes 20+ seconds, those delays compound into hours of lost time.
So I ran a benchmark. I tested 15+ models through UCloud’s standardized environment with identical prompts to measure actual throughput (tokens/second) and cost efficiency. Here’s what I found.
The Benchmark Setup
I used a creative writing prompt requesting approximately 200 words. This standardizes the comparison across different models and eliminates task-specific bias. The test measured:
- Speed: Tokens generated per second (t/s)
- Cost: Price per 1,000 tokens
- Throughput: Total time to complete the request
The results surprised me.
Speed Rankings: Who’s Actually Fast?
Model | Speed (t/s) | Relative to GPT-5.1-------------------------|-------------|--------------------MiniMax-M2.1 | 85.4 | 4.8x fasterGPT-5.1-codex-mini | 69.8 | 3.9x fasterKimi-K2.5 | 51.0 | 2.8x fasterClaude-Haiku-4.5 | 46.9 | 2.6x fasterDeepSeek-V3.2 | 34.6 | 1.9x fasterClaude-Sonnet-4.5 | 28.2 | 1.6x fasterClaude-Opus-4.6 | 28.1 | 1.6x fasterGPT-5.1 | 17.9 | baselineThe speed difference is massive. MiniMax-M2.1 generates tokens nearly 5x faster than GPT-5.1. For a typical 500-token code completion, that’s:
MiniMax-M2.1: ~5.8 secondsGPT-5.1: ~27.9 secondsDifference: 22 seconds per requestIf I make 100 completions per day, that’s 36+ minutes saved daily, or 220+ hours per year.
But speed isn’t everything. Let’s talk cost.
Cost Efficiency: Who’s Actually Cheap?
Model | Cost/1K | Value Rating-------------------------|-------------|---------------GPT-5.1-codex-mini | ¥0.043 | ExcellentDeepSeek-V3.2 | ¥0.267 | GoodKimi-K2.5 | ¥0.255 | GoodMiniMax-M2.1 | ¥0.348 | GoodClaude-Haiku-4.5 | ¥0.40 | FairClaude-Sonnet-4.5 | ¥1.0 | PremiumClaude-Opus-4.6 | ¥1.996 | LuxuryThe price spread is 46x between the cheapest and most expensive. GPT-5.1-codex-mini costs ¥0.043 per 1K tokens, while Claude-Opus-4.6 costs ¥1.996. For 1 million tokens of processing:
GPT-5.1-codex-mini: ¥43Claude-Opus-4.6: ¥1,996Savings: ¥1,953 (98% cheaper)But here’s the catch: cheap models may struggle with complex reasoning. I learned this the hard way when I tried using the budget model for architecture decisions and got superficial suggestions.
The Trade-off Matrix
I created a decision matrix to help me choose:
# Model data from benchmarkmodels = { "DeepSeek-V3.2": {"speed": 34.6, "cost_per_1k": 0.267, "use_case": "reasoning"}, "Kimi-K2.5": {"speed": 51.0, "cost_per_1k": 0.255, "use_case": "speed"}, "Claude-Haiku-4.5": {"speed": 46.9, "cost_per_1k": 0.40, "use_case": "balanced"}, "MiniMax-M2.1": {"speed": 85.4, "cost_per_1k": 0.348, "use_case": "fast"}, "GPT-5.1-codex-mini": {"speed": 69.8, "cost_per_1k": 0.043, "use_case": "budget"}, "Claude-Opus-4.6": {"speed": 28.1, "cost_per_1k": 1.996, "use_case": "quality"},}
def recommend_model(priority, budget_per_1k=1.0): """Recommend model based on priority and budget""" filtered = {k: v for k, v in models.items() if v["cost_per_1k"] <= budget_per_1k}
if not filtered: return None, None
if priority == "speed": return max(filtered.items(), key=lambda x: x[1]["speed"]) elif priority == "cost": return min(filtered.items(), key=lambda x: x[1]["cost_per_1k"]) else: # balanced return max(filtered.items(), key=lambda x: x[1]["speed"] / x[1]["cost_per_1k"])
# My typical use casesprint(recommend_model("speed")) # Kimi-K2.5 or MiniMax-M2.1print(recommend_model("cost")) # GPT-5.1-codex-miniprint(recommend_model("balanced")) # Claude-Haiku-4.5This helped me realize I needed different models for different tasks.
My Actual Workflow Now
I don’t use one model for everything anymore. Instead, I match the model to the task:
Task Type | Model Choice | Why-----------------------|------------------------|---------------------------Quick code completion | GPT-5.1-codex-mini | Fast + cheapCode review | Claude-Haiku-4.5 | Good enough + fastArchitecture decisions | Claude-Opus-4.6 | Need maximum qualityDocumentation drafts | DeepSeek-V3.2 | Reasoning + reasonable costReal-time chat | MiniMax-M2.1 | Speed priorityThis multi-model approach cut my API costs by 60% while actually improving my overall experience because I’m not waiting on slow responses for simple tasks.
What I Got Wrong Initially
My first mistake was thinking “faster is always better.” MiniMax-M2.1 is the fastest at 85.4 t/s, but when I used it for complex code reasoning, the quality wasn’t there. I had to re-prompt multiple times, which actually made it slower overall.
My second mistake was over-optimizing for cost. I switched everything to GPT-5.1-codex-mini because it was 46x cheaper. Then I spent hours debugging bad suggestions for edge cases. The time cost far exceeded the dollar savings.
The sweet spot turned out to be task-specific selection. I use:
- Budget model for simple, high-volume tasks (completions, formatting)
- Balanced model for everyday coding (Haiku for most things)
- Quality model for critical decisions (Opus for architecture, reviews)
The Numbers That Matter
Here’s the summary of what I tested:
Model | Speed | Cost/1K | Speed/Cost Ratio-------------------------|----------|-----------|------------------MiniMax-M2.1 | 85.4 t/s | ¥0.348 | 245.4GPT-5.1-codex-mini | 69.8 t/s | ¥0.043 | 1623.3 (best value)Kimi-K2.5 | 51.0 t/s | ¥0.255 | 200.0Claude-Haiku-4.5 | 46.9 t/s | ¥0.40 | 117.3DeepSeek-V3.2 | 34.6 t/s | ¥0.267 | 129.6Claude-Sonnet-4.5 | 28.2 t/s | ¥1.0 | 28.2Claude-Opus-4.6 | 28.1 t/s | ¥1.996 | 14.1GPT-5.1 | 17.9 t/s | N/A | N/AThe “speed/cost ratio” is my homemade metric: tokens per second divided by cost per 1K. Higher is better. GPT-5.1-codex-mini dominates this metric, but remember—this doesn’t account for quality.
Quick Selection Guide
If you just want a recommendation:
| Priority | Model | Speed | Cost | Why ||-----------------|---------------------|---------|-----------|------------------------|| Maximum speed | MiniMax-M2.1 | 85.4 t/s| ¥0.348/1K | Fastest in test || Budget coding | GPT-5.1-codex-mini | 69.8 t/s| ¥0.043/1K | Cheapest, still fast || Best balance | Claude-Haiku-4.5 | 46.9 t/s| ¥0.40/1K | Quality + speed || Complex reason | Claude-Opus-4.6 | 28.1 t/s| ¥1.996/1K | Highest quality || Fast + capable | Kimi-K2.5 | 51.0 t/s| ¥0.255/1K | Speed + reasoning |What This Doesn’t Cover
This benchmark measures speed and cost. It doesn’t measure:
- Code quality: Can the model actually write good code?
- Instruction following: How well does it follow complex prompts?
- Context handling: How does it perform with long contexts?
- Specialized tasks: Performance on specific domains (SQL, Rust, etc.)
For those, you need task-specific benchmarks. But for understanding raw throughput and cost efficiency, these numbers tell the story.
The Takeaway
Model selection is a three-variable optimization problem: speed, cost, and quality. You can’t maximize all three simultaneously. I pick two based on the task:
- Speed + Cost → GPT-5.1-codex-mini (for simple, high-volume work)
- Speed + Quality → Claude-Haiku-4.5 (for everyday coding)
- Cost + Quality → DeepSeek-V3.2 (for reasoning tasks on a budget)
- Quality only → Claude-Opus-4.6 (for critical decisions)
The right choice depends on what you’re optimizing for. But you can’t make that choice without the data. Now you have it.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 UCloud AI Model Benchmark
- 👨💻 Claude Model Comparison
- 👨💻 OpenAI GPT Models
- 👨💻 DeepSeek AI
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments