Building Agentic Coding Workflows with Budget AI Models: MiniMax, GLM, and Kimi Tested
I was burning through my Opus budget running multiple AI agents in parallel. Each agent session consumed tokens like crazy, and my monthly bill was approaching $400. I needed to find budget models that could serve as executors while keeping Opus for planning.
After testing MiniMax, GLM, and Kimi for two weeks with my agent workflows, I found a clear winner: MiniMax 2.7 for execution, Opus for planning.
The Problem
Running AI agents continuously is expensive. Here’s what my typical workflow looked like:
+------------------+------------------+------------------+| Task | Tokens/Session | Cost/Session |+------------------+------------------+------------------+| Planning phase | ~50k | $0.75 (Opus) || Execution phase | ~200k | $3.00 (Opus) || Review phase | ~30k | $0.45 (Opus) |+------------------+------------------+------------------+| Total per agent | ~280k | $4.20 |+------------------+------------------+------------------+With 5-10 agent sessions per day, costs exploded. The execution phase consumed the most tokens but required the least creativity. That’s when I started testing budget models as executors.
What I Tested
I evaluated three Chinese budget models for agent execution:
+------------------+------------------+------------------+------------------+| Feature | MiniMax 2.7 | GLM 5.1 | Kimi K2.5 |+------------------+------------------+------------------+------------------+| Speed (TPS) | 100 constant | 30-50 variable | 40-60 variable || Context window | 32k | 128k | 100k+ || Quota system | 1500/5hr window | Daily limits | Variable || Weekly cap | None | Yes | Varies || Cost efficiency | Excellent | Good | Moderate || Agent reliability | High | Medium | Medium |+------------------+------------------+------------------+------------------+Why MiniMax Won
The Reddit discussion in r/opencodeCLI confirmed my experience:
“for a day-to-day driver minimax is my go-to if im going for cheap”
“minimax offers highspeed and 100tps constant all the time no matter what time of day is it”
Speed matters for agents. When an agent is executing a plan step-by-step, waiting for slow responses kills productivity. MiniMax’s consistent 100 TPS meant my agents kept moving without bottlenecks.
GLM had infrastructure issues that made it tedious:
“due to quite slow improvements of the infrastructure of glm main provider it’s been too tedious to wait for the code”
The Hybrid Workflow
I now use a two-phase approach:
+-------------------+ +-------------------+ +-------------------+| PLAN PHASE | | CONTEXT PHASE | | EXECUTE PHASE || (Opus/Sonnet) | | (Caliber/Manual)| | (MiniMax 2.7) |+-------------------+ +-------------------+ +-------------------+| | | | | || Analyze task | | Generate context | | Read plan || Create plan |---->| Architecture.md |---->| Read context || Define constraints| | Patterns.md | | Implement code || | | Dependencies.md | | Verify against |+-------------------+ +-------------------+ | plan | +-------------------+Phase 1: Planning with Opus
Opus analyzes the task and creates a detailed plan:
# Plan: Add User Dashboard
## Files to Create- `/app/dashboard/page.tsx`- `/app/dashboard/components/UserTable.tsx`- `/app/dashboard/components/SearchFilter.tsx`
## Constraints- Use existing auth context from `/contexts/AuthContext.tsx`- Match patterns in `/app/users/page.tsx`- Use shadcn/ui components (Table, Input, Button)
## Implementation Steps1. Create page.tsx with layout and auth check2. Create UserTable with mock data first3. Add SearchFilter client-side state4. Connect to /api/users endpoint
## Don't- Don't add new dependencies- Don't change auth implementation- Don't create additional components beyond listedThe key is being explicit. Budget models follow plans well but need clear instructions.
Phase 2: Context Files
This was the breakthrough from the Reddit discussion:
“what also makes a huge diff regardless of model is how good ur context is” “when the agent actually understands the codebase architecture it stops making dumb mistakes way more often regardless of which budget model ur using”
I generate project-specific context files:
# Project Structure- `/app/` - Next.js 14 App Router pages- `/components/` - Shared React components- `/lib/` - Utility functions- `/api/` - API route handlers
## Patterns- Server Components for data fetching- Client Components for interactivity ('use client' directive)- shadcn/ui for UI components- Tailwind CSS for styling
## Key Files- `/contexts/AuthContext.tsx` - Auth state management- `/lib/api.ts` - API helper functionsCaliber (an open-source tool) auto-generates these context files per repository. The Reddit poster mentioned hitting 250 stars and 90 PRs merged with this approach.
Phase 3: Execution with MiniMax
The executor prompt template:
You are implementing a pre-planned feature.
CONTEXT FILES:- Architecture: See [Architecture.md]- Patterns: See [Patterns.md]
PLAN:[Insert plan from planning phase]
INSTRUCTIONS:1. Read context files first2. Follow plan exactly3. Create files listed in plan4. Verify constraints are met
DO NOT:- Deviate from the plan- Add extra features- Change existing patternsModel Selection for Different Agent Tasks
+------------------------+-------------------+------------------------------+| Agent Task | Recommended Model | Reason |+------------------------+-------------------+------------------------------+| Planning | Opus/Sonnet | Deep reasoning, architecture || Execution | MiniMax | Fast, reliable, generous || Code Review | GLM 5.1 | Quality analysis || Long Context | Kimi K2.5 | 100k+ context window || Parallel Agents | MiniMax | No weekly cap, 100 TPS |+------------------------+-------------------+------------------------------+What I Tried First (That Didn’t Work)
Attempt 1: Using GLM for Everything
GLM’s context window is impressive, but the speed issues killed agent productivity. Waiting 5-10 seconds between token chunks made the agent feel sluggish.
Attempt 2: Kimi for Long Context
Kimi handles 100k+ tokens, which seemed perfect for large codebases. But the variable speed and quota limits made it unreliable for parallel agent execution.
Attempt 3: No Context Files
I initially sent MiniMax directly to implement features. It made “dumb mistakes” like using wrong patterns, missing existing utilities, and creating duplicate code.
The context files solved this. Now MiniMax understands the codebase architecture before writing code.
The Cost Savings
+------------------+------------------+------------------+| Metric | All Opus | Hybrid Approach |+------------------+------------------+------------------+| Plan tokens | 500k | 500k || Execute tokens | 3M | 3M || Review tokens | 300k | 300k |+------------------+------------------+------------------+| Plan cost | $7.50 | $7.50 || Execute cost | $45.00 | $4.00 || Review cost | $4.50 | $4.50 |+------------------+------------------+------------------+| Total | $57.00 | $16.00 |+------------------+------------------+------------------+| Savings | - | $41.00 (72%) |+------------------+------------------+------------------+The execution phase cost dropped from $45 to $4 per month. That’s a 90% reduction in the most expensive phase.
Common Mistakes
Using Budget Models for Planning
Budget models need explicit prompting for planning. They don’t naturally break down complex tasks into actionable steps. Use Opus/Sonnet for planning, always.
Skipping Context Files
Without context files, budget models make architectural mistakes. The 10 minutes spent generating context files saves hours of debugging wrong implementations.
Choosing Slow Models for Agents
Agent responsiveness matters. GLM’s variable speed (30-50 TPS) vs MiniMax’s consistent 100 TPS makes a huge difference in perceived productivity.
Not Using Parallel Agents
MiniMax has no weekly cap. The Reddit poster mentioned:
“you can push a ton across multiple agents during the initial 2 hours”
I now run 3-4 agents in parallel during peak productivity windows.
Summary
For agentic coding workflows:
- Use MiniMax 2.7 as executor — Fast, reliable, generous quotas
- Keep Opus/Sonnet for planning — Deep reasoning requires premium models
- Generate context files — Architecture.md, Patterns.md, Dependencies.md
- Run parallel agents — MiniMax handles concurrent requests well
- Monitor quality — Escalate to premium when executor struggles
The hybrid approach cut my monthly API costs by 72% while maintaining productivity. MiniMax won’t replace Opus for complex reasoning, but it doesn’t need to. For execution tasks, it’s more than capable.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments