How to Get Better Results from Chinese AI Models: Prompting Tips That Work

May 6, 2026

I recently switched from GPT and Claude to Chinese AI models like DeepSeek and Kimi for cost reasons. The first few days were frustrating—the results felt inconsistent, the models seemed “dumber,” and I almost gave up. Then I realized the problem wasn’t the models. It was my prompting.

The Problem: Lazy Prompting Habits

Over the past year, I got spoiled. GPT-4.5 and Claude Opus became so good at inferring intent that I stopped being explicit. I’d write prompts like:

Add a user profile page with avatar upload

And they’d just figure it out. They’d infer the layout, the validation, the error handling, the styling patterns from my codebase.

Chinese models don’t do that. When I used the same lazy prompts with DeepSeek, I got incomplete implementations, missed edge cases, and code that didn’t match my project’s conventions.

The Insight: It’s About Training Data, Not Intelligence

After digging through Reddit discussions and experimenting, I found the key difference: Western companies spend heavily on fine-tuning for instruction following—both explicit and implicit.

One developer put it well: “If you know how to work with the models, you can get similar results from the Chinese models, but prompting/workflow is more important.”

The models are genuinely capable. But the harness—the system prompts, file context rules, response formats—matters more than with Western models.

Five Prompting Techniques That Actually Work

1. Always Use Plan Mode First

This was the single biggest improvement. Instead of jumping straight to code, I ask the model to plan:

I need to add a user profile page. Before writing any code, please:
1. Analyze my existing page structure and patterns
2. Identify what components I'll need
3. List the API endpoints required
4. Note any edge cases to handle

As one commenter said: “These models are not too bright, but a plan goes a long way.”

The plan gives the model structure to follow. It’s like giving a junior developer a spec document instead of just a feature name.

2. Be Explicit About Everything

Here’s the prompting pattern comparison that changed my results:

# Lazy prompting (works with GPT/Claude, fails with Chinese models)
"Add a user profile page with avatar upload"

# Explicit prompting (works with Chinese models)
"Add a user profile page with the following requirements:
1. Layout: Two-column design (sidebar with avatar, main content with form)
2. Avatar: Support JPG/PNG, max 2MB, show preview before upload
3. Form fields: name, email, bio (textarea, 500 char max)
4. Validation: Real-time validation with error messages below each field
5. API endpoint: POST /api/user/profile
6. Style: Use existing Tailwind classes, match the settings page pattern
7. Edge cases: Handle network errors, show loading state during upload"

With DeepSeek, the explicit prompt gave me production-ready code in one shot. The lazy prompt required three iterations of corrections.

3. Break Complex Tasks into Steps

I used to ask for entire features in one go. Now I break it down:

Create the data models
Implement the API endpoints
Add the frontend components
Write tests

Each step gets the model’s full attention. The quality improvement is dramatic.

4. Optimize Your Agent Harness

The model is only as good as its context. I invested time configuring:

System prompts: Clear role definition and output format requirements
File context rules: Which files to include, which to exclude
Response formats: Structured output instead of freeform text

This upfront investment paid off quickly. The harness optimization reduced my iteration cycles by half.

5. Match Model to Task

Different Chinese models have different strengths:

Kimi: Frontend work, UI components
DeepSeek: Backend logic, API design
GLM: Debugging, code explanation

Using the right model for the task matters more than with Western models.

Why This Matters: Closing the Productivity Gap

Before adjusting my prompting, Chinese models felt 5x slower than Claude. After optimization, the gap shrank to about 10%. The cost savings? Over 90%.

The key insight: Western models trained on massive feedback loops from expensive human labelers ($40-150/hr). They learned to infer intent. Chinese models can achieve similar results, but you need to provide the context they lack.

Common Mistakes to Avoid

I made all of these:

Using the same prompts that work with Claude/GPT
Skipping the planning phase because “it takes extra time”
Not investing in harness setup because “I’ll just iterate”
Expecting the model to read my mind about edge cases

The extra 2 minutes spent on explicit prompting saves 20 minutes of corrections.

If you’re working with Chinese AI models, you might also find these approaches helpful:

Chain-of-thought prompting: Ask the model to explain its reasoning step by step
Few-shot examples: Include 1-2 examples of the output format you want
Self-consistency checks: Ask the model to review its own output before finalizing

These techniques help bridge the implicit instruction following gap.

Final Thoughts

Chinese AI models aren’t worse—they’re different. The productivity gap exists because we’ve adapted our prompting to Western models’ strengths. With explicit prompting, structured planning, and optimized workflows, Chinese models can deliver comparable quality at a fraction of the cost.

The future isn’t about which model is “better.” It’s about adapting your workflow to each model’s capabilities. Once you do that, the cost-performance equation shifts dramatically in favor of Chinese alternatives.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!