High for Planning, Medium for Implementation: The GPT-5.4 Workflow That Actually Works
I kept burning through my token budget. Every project started with grand architectural plans from GPT-5.4, then somewhere in the implementation phase, everything went sideways. Either the code didn’t match the plan, or I’d run out of context before finishing.
Then a Reddit comment changed everything: “I use High for planning and Medium for implementing the detailed plan.”
Simple. Obvious in hindsight. But I’d been doing it wrong for weeks.
The Problem With Single-Level Workflows
I started with a simple assumption: Higher reasoning = Better results. So I used High (or XHigh) for everything.
Here’s what happened on a typical feature request:
Task: Add user authentication to a FastAPI backend
Attempt with High throughout:- Planning: Comprehensive, 40 lines of architecture decisions- Implementation: Over-detailed code, excessive error handling- Result: 3 hours, 200K tokens, over-engineered solutionThe implementation phase was where things went wrong. High reasoning kept trying to re-architect during coding. It added layers I didn’t ask for. It second-guessed decisions from the planning phase.
Same task, Medium throughout:- Planning: Missed several edge cases, no security considerations- Implementation: Fast, but followed incomplete plan- Result: 45 minutes, 50K tokens, broken auth flowNeither approach worked well. I needed the deep thinking for planning, but not for execution.
Why This Happens
GPT-5.4’s reasoning levels aren’t just about “smartness.” They’re about cognitive style.
High reasoning excels at:
- Strategic thinking and comprehensive analysis
- Following instructions very closely
- Producing solid, predictable results
- Cross-system impact assessment
Medium reasoning excels at:
- Efficient execution of well-defined tasks
- Speed without overthinking
- Practical implementation
When you use High for implementation, it brings strategic thinking to tactical tasks. That’s misaligned. It’s like hiring a chief architect to lay individual bricks.
When you use Medium for planning, it brings tactical thinking to strategic tasks. That’s also misaligned. It’s like asking a bricklayer to design the building.
The Two-Tier Workflow
I now split every project into two distinct phases:
┌─────────────────────────────────────────────────────────────────┐│ PROJECT LIFECYCLE │├─────────────────────────────────────────────────────────────────┤│ ││ PHASE 1: PLANNING (High Reasoning) ~10-20% of effort ││ ───────────────────────────────────────────────────────────── ││ • Architecture design ││ • Requirements analysis ││ • Risk assessment ││ • Create detailed implementation roadmap ││ ││ ▼ ││ ││ PHASE 2: IMPLEMENTATION (Medium Reasoning) ~70-80% of effort ││ ───────────────────────────────────────────────────────────── ││ • Execute the plan step by step ││ • Write code according to specifications ││ • Run tests and fix issues ││ • Refine implementation details ││ ││ ▼ ││ ││ PHASE 3: REVIEW (High Reasoning) ~10% of effort ││ ───────────────────────────────────────────────────────────── ││ • Validate implementation against plan ││ • Identify deviations or issues ││ • Recommend improvements ││ │└─────────────────────────────────────────────────────────────────┘This workflow matches reasoning style to task requirements.
A Real Example
Task: Add JWT authentication to an existing API
Phase 1: Planning with High Reasoning
I prompted with High reasoning:
Analyze our FastAPI backend and create a detailed plan for addingJWT authentication. Include architecture decisions, implementationsteps, risk assessment, and testing strategy.High reasoning output:
## Architecture Decision- JWT tokens with refresh token rotation- Rate limiting on auth endpoints- bcrypt password hashing (cost factor 12)- Email verification flow
## Implementation Steps1. Create User model with email/password fields2. Implement password hashing middleware3. Create /auth/register endpoint4. Create /auth/login endpoint5. Implement JWT generation/validation6. Add refresh token logic7. Create auth middleware for protected routes8. Add rate limiting middleware9. Write tests for each endpoint10. Add email verification service
## Risk Assessment- SQL injection: Use parameterized queries- Brute force: Implement rate limiting- Token theft: Short-lived access tokens + refresh rotation
## Testing Strategy- Unit tests: Password hashing, token generation- Integration tests: Auth flow end-to-end- Security tests: Rate limiting, injection attemptsComprehensive. Covered edge cases I wouldn’t have thought of. Now the implementation phase had a solid foundation.
Phase 2: Implementation with Medium Reasoning
For each step, I switched to Medium:
Implement step 3 from the plan: Create /auth/register endpoint.Follow the plan specifications exactly.Medium reasoning output:
from fastapi import APIRouter, HTTPExceptionfrom pydantic import BaseModel, EmailStrfrom passlib.context import CryptContext
router = APIRouter()pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
class UserRegister(BaseModel): email: EmailStr password: str
@router.post("/auth/register")async def register(user: UserRegister): if len(user.password) < 12: raise HTTPException( status_code=400, detail="Password must be at least 12 characters" )
hashed_password = pwd_context.hash(user.password)
db_user = await create_user( email=user.email, password_hash=hashed_password )
return {"message": "User registered successfully", "user_id": db_user.id}Clean, focused, followed the plan. No over-engineering. No re-architecting. Just execution.
Phase 3: Review with High Reasoning
After implementation:
Review this implementation against the original plan. Identifydeviations, missing pieces, and potential improvements.High reasoning caught what Medium missed:
## Adherence to Plan✅ User model created with correct fields✅ Password hashing uses bcrypt cost factor 12✅ Rate limiting implemented⚠️ Email verification not yet implemented (Step 10)
## Identified Issues1. Missing password complexity validation2. No duplicate email check before registration3. Missing CSRF protection for cookies
## Recommendations1. Add regex validation for password complexity2. Add unique constraint check on email field3. Implement CSRF token for state-changing requestsToken Usage Comparison
I measured this approach against my previous single-level workflow:
| Approach | Planning | Implementation | Review | Total |
|---|---|---|---|---|
| High throughout | 45K | 180K | 25K | 250K |
| Medium throughout | 15K | 60K | 10K | 85K |
| High→Medium→High | 40K | 70K | 20K | 130K |
The two-tier approach costs more than Medium-only but produces far better results. And it costs significantly less than High-only while avoiding over-engineering.
When to Deviate
This isn’t a rigid rule. There are exceptions:
Use High for implementation when:
- Implementing a critical security feature
- The plan has ambiguities that need interpretation
- You’re in uncharted territory without clear specs
Use Medium for planning when:
- The task is straightforward and well-understood
- You have existing templates or patterns to follow
- Speed matters more than comprehensive coverage
Use XHigh for planning when:
- Large codebase with complex file relationships
- Cross-module dependencies need tracking
- High reasoning fails to capture all considerations
Common Anti-Patterns
Anti-Pattern 1: High for simple implementation
Task: Write a simple debounce function
High reasoning output:- Generic type constraints- Multiple overload signatures- Extensive JSDoc comments- A wrapper class "for extensibility"- 47 lines for a 5-line functionFix: Use Medium for straightforward implementation tasks.
Anti-Pattern 2: Medium for complex planning
Task: Plan authentication system architecture
Medium reasoning output:- Basic JWT approach- Missed refresh token rotation- No rate limiting consideration- Incomplete risk assessmentFix: Always use High or XHigh for architectural decisions.
Anti-Pattern 3: Never switching levels
Some developers pick one level and stick with it regardless of task. This wastes either quality (Medium for everything) or efficiency (High for everything).
The Decision Matrix
After months of experimentation, here’s my reference:
| Task Type | Level | Reason |
|---|---|---|
| Initial architecture | High/XHigh | Strategic decisions need depth |
| Requirements analysis | High | Complex reasoning for edge cases |
| Code implementation | Medium | Efficiency for well-defined tasks |
| Bug fixing | Medium-High | Depends on bug complexity |
| Code review | High | Need comprehensive analysis |
| Testing | Low-Medium | Routine execution |
| Refactoring | Medium | Following clear patterns |
| Documentation | Low-Medium | Straightforward writing |
How to Start
- Before your next project, explicitly define which reasoning level for each phase
- Start with High for planning, generate a detailed plan
- Switch to Medium for implementation, execute the plan
- End with High for review, catch what Medium missed
- Track results and adjust based on your specific use cases
The key insight: Different phases of work require different cognitive styles. Match the reasoning level to the task, not the project.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments