Skip to content

Building Agentic Coding Workflows with Budget AI Models: MiniMax, GLM, and Kimi Tested

I was burning through my Opus budget running multiple AI agents in parallel. Each agent session consumed tokens like crazy, and my monthly bill was approaching $400. I needed to find budget models that could serve as executors while keeping Opus for planning.

After testing MiniMax, GLM, and Kimi for two weeks with my agent workflows, I found a clear winner: MiniMax 2.7 for execution, Opus for planning.

The Problem

Running AI agents continuously is expensive. Here’s what my typical workflow looked like:

Token Usage Breakdown
+------------------+------------------+------------------+
| Task | Tokens/Session | Cost/Session |
+------------------+------------------+------------------+
| Planning phase | ~50k | $0.75 (Opus) |
| Execution phase | ~200k | $3.00 (Opus) |
| Review phase | ~30k | $0.45 (Opus) |
+------------------+------------------+------------------+
| Total per agent | ~280k | $4.20 |
+------------------+------------------+------------------+

With 5-10 agent sessions per day, costs exploded. The execution phase consumed the most tokens but required the least creativity. That’s when I started testing budget models as executors.

What I Tested

I evaluated three Chinese budget models for agent execution:

Budget Model Comparison
+------------------+------------------+------------------+------------------+
| Feature | MiniMax 2.7 | GLM 5.1 | Kimi K2.5 |
+------------------+------------------+------------------+------------------+
| Speed (TPS) | 100 constant | 30-50 variable | 40-60 variable |
| Context window | 32k | 128k | 100k+ |
| Quota system | 1500/5hr window | Daily limits | Variable |
| Weekly cap | None | Yes | Varies |
| Cost efficiency | Excellent | Good | Moderate |
| Agent reliability | High | Medium | Medium |
+------------------+------------------+------------------+------------------+

Why MiniMax Won

The Reddit discussion in r/opencodeCLI confirmed my experience:

“for a day-to-day driver minimax is my go-to if im going for cheap”

“minimax offers highspeed and 100tps constant all the time no matter what time of day is it”

Speed matters for agents. When an agent is executing a plan step-by-step, waiting for slow responses kills productivity. MiniMax’s consistent 100 TPS meant my agents kept moving without bottlenecks.

GLM had infrastructure issues that made it tedious:

“due to quite slow improvements of the infrastructure of glm main provider it’s been too tedious to wait for the code”

The Hybrid Workflow

I now use a two-phase approach:

Agentic Workflow Architecture
+-------------------+ +-------------------+ +-------------------+
| PLAN PHASE | | CONTEXT PHASE | | EXECUTE PHASE |
| (Opus/Sonnet) | | (Caliber/Manual)| | (MiniMax 2.7) |
+-------------------+ +-------------------+ +-------------------+
| | | | | |
| Analyze task | | Generate context | | Read plan |
| Create plan |---->| Architecture.md |---->| Read context |
| Define constraints| | Patterns.md | | Implement code |
| | | Dependencies.md | | Verify against |
+-------------------+ +-------------------+ | plan |
+-------------------+

Phase 1: Planning with Opus

Opus analyzes the task and creates a detailed plan:

plan-add-user-dashboard.md
# Plan: Add User Dashboard
## Files to Create
- `/app/dashboard/page.tsx`
- `/app/dashboard/components/UserTable.tsx`
- `/app/dashboard/components/SearchFilter.tsx`
## Constraints
- Use existing auth context from `/contexts/AuthContext.tsx`
- Match patterns in `/app/users/page.tsx`
- Use shadcn/ui components (Table, Input, Button)
## Implementation Steps
1. Create page.tsx with layout and auth check
2. Create UserTable with mock data first
3. Add SearchFilter client-side state
4. Connect to /api/users endpoint
## Don't
- Don't add new dependencies
- Don't change auth implementation
- Don't create additional components beyond listed

The key is being explicit. Budget models follow plans well but need clear instructions.

Phase 2: Context Files

This was the breakthrough from the Reddit discussion:

“what also makes a huge diff regardless of model is how good ur context is” “when the agent actually understands the codebase architecture it stops making dumb mistakes way more often regardless of which budget model ur using”

I generate project-specific context files:

Architecture.md
# Project Structure
- `/app/` - Next.js 14 App Router pages
- `/components/` - Shared React components
- `/lib/` - Utility functions
- `/api/` - API route handlers
## Patterns
- Server Components for data fetching
- Client Components for interactivity ('use client' directive)
- shadcn/ui for UI components
- Tailwind CSS for styling
## Key Files
- `/contexts/AuthContext.tsx` - Auth state management
- `/lib/api.ts` - API helper functions

Caliber (an open-source tool) auto-generates these context files per repository. The Reddit poster mentioned hitting 250 stars and 90 PRs merged with this approach.

Phase 3: Execution with MiniMax

The executor prompt template:

executor-prompt-template.md
You are implementing a pre-planned feature.
CONTEXT FILES:
- Architecture: See [Architecture.md]
- Patterns: See [Patterns.md]
PLAN:
[Insert plan from planning phase]
INSTRUCTIONS:
1. Read context files first
2. Follow plan exactly
3. Create files listed in plan
4. Verify constraints are met
DO NOT:
- Deviate from the plan
- Add extra features
- Change existing patterns

Model Selection for Different Agent Tasks

Agent Task Routing Matrix
+------------------------+-------------------+------------------------------+
| Agent Task | Recommended Model | Reason |
+------------------------+-------------------+------------------------------+
| Planning | Opus/Sonnet | Deep reasoning, architecture |
| Execution | MiniMax | Fast, reliable, generous |
| Code Review | GLM 5.1 | Quality analysis |
| Long Context | Kimi K2.5 | 100k+ context window |
| Parallel Agents | MiniMax | No weekly cap, 100 TPS |
+------------------------+-------------------+------------------------------+

What I Tried First (That Didn’t Work)

Attempt 1: Using GLM for Everything

GLM’s context window is impressive, but the speed issues killed agent productivity. Waiting 5-10 seconds between token chunks made the agent feel sluggish.

Attempt 2: Kimi for Long Context

Kimi handles 100k+ tokens, which seemed perfect for large codebases. But the variable speed and quota limits made it unreliable for parallel agent execution.

Attempt 3: No Context Files

I initially sent MiniMax directly to implement features. It made “dumb mistakes” like using wrong patterns, missing existing utilities, and creating duplicate code.

The context files solved this. Now MiniMax understands the codebase architecture before writing code.

The Cost Savings

Monthly Cost Comparison
+------------------+------------------+------------------+
| Metric | All Opus | Hybrid Approach |
+------------------+------------------+------------------+
| Plan tokens | 500k | 500k |
| Execute tokens | 3M | 3M |
| Review tokens | 300k | 300k |
+------------------+------------------+------------------+
| Plan cost | $7.50 | $7.50 |
| Execute cost | $45.00 | $4.00 |
| Review cost | $4.50 | $4.50 |
+------------------+------------------+------------------+
| Total | $57.00 | $16.00 |
+------------------+------------------+------------------+
| Savings | - | $41.00 (72%) |
+------------------+------------------+------------------+

The execution phase cost dropped from $45 to $4 per month. That’s a 90% reduction in the most expensive phase.

Common Mistakes

Using Budget Models for Planning

Budget models need explicit prompting for planning. They don’t naturally break down complex tasks into actionable steps. Use Opus/Sonnet for planning, always.

Skipping Context Files

Without context files, budget models make architectural mistakes. The 10 minutes spent generating context files saves hours of debugging wrong implementations.

Choosing Slow Models for Agents

Agent responsiveness matters. GLM’s variable speed (30-50 TPS) vs MiniMax’s consistent 100 TPS makes a huge difference in perceived productivity.

Not Using Parallel Agents

MiniMax has no weekly cap. The Reddit poster mentioned:

“you can push a ton across multiple agents during the initial 2 hours”

I now run 3-4 agents in parallel during peak productivity windows.

Summary

For agentic coding workflows:

  1. Use MiniMax 2.7 as executor — Fast, reliable, generous quotas
  2. Keep Opus/Sonnet for planning — Deep reasoning requires premium models
  3. Generate context files — Architecture.md, Patterns.md, Dependencies.md
  4. Run parallel agents — MiniMax handles concurrent requests well
  5. Monitor quality — Escalate to premium when executor struggles

The hybrid approach cut my monthly API costs by 72% while maintaining productivity. MiniMax won’t replace Opus for complex reasoning, but it doesn’t need to. For execution tasks, it’s more than capable.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

References

Comments