grill-me vs GSD Skill for Codex: Which Planning Approach Works Better
I kept getting half-baked implementations from Codex. The AI would make assumptions about my requirements, code around problems I didn’t have, and deliver features that needed three revision cycles. Then I found two different planning skills: grill-me and GSD. After testing both for a month, I realized they solve different problems.
The Quick Answer
Use grill-me for single complex features. Use GSD for full projects or multi-phase work.
[What are you building?] |-- Single complex feature --> grill-me (interview-style planning) |-- Multi-phase project --> GSD (full lifecycle management) |-- Complete application --> GSD (research + roadmap + execution)The key difference isn’t quality. Both work well. The difference is scope.
My Problem with Assumption-Based Planning
Before I found these skills, my Codex workflow looked like this:
- Describe feature vaguely
- Codex makes assumptions
- I get code that doesn’t match my mental model
- Revision cycle begins
- Repeat until frustrated
A Reddit post titled “I tried the grill-me skill and it completely changed how I plan with Codex” caught my attention. The poster claimed a 90% success rate improvement. I tried it.
Then someone in the comments mentioned GSD. I tried that too.
How grill-me Works
The grill-me skill is five sentences. That’s it. Here’s the entire prompt:
Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree... Ask the questions one at a time. If a question can be answered by exploring the codebase, explore the codebase instead.What happens when you use it:
- Codex switches from “generate mode” to “interview mode”
- It asks one question at a time (not a question dump)
- It explores your codebase to answer questions when possible
- It stops when we reach shared understanding
Here’s what a session looks like:
Me: Use grill-me to help me decide on the database architecture
Codex: Let me explore your codebase first...[explores existing data models, query patterns]
Based on your current setup, what's your expected data volume?
Me: About 10k records per day
Codex: With that volume, PostgreSQL makes sense over MongoDB because your query patterns show relational joins. What's your read/write ratio?
Me: 80% reads, 20% writes
Codex: That suggests read replicas would help. Do you need real-time consistency across replicas, or is eventual consistency acceptable?The interview continues until Codex understands my actual requirements, not its assumptions about them.
How GSD Works
GSD (Get Shit Done) is a complete workflow system. It’s not a single prompt. It’s a meta-prompting, context engineering, and spec-driven development system with multiple commands:
/gsd:new-project → Questions → Research → Requirements → Roadmap/gsd:discuss-phase → Capture implementation decisions/gsd:plan-phase → Research + Plan + Verify/gsd:execute-phase → Parallel execution in fresh contexts/gsd:verify-work → User acceptance testing/gsd:ship → Create PRWhen I ran /gsd:new-project for a dashboard application:
1. Interview phase: 15 questions about requirements2. Research phase: 4 parallel agents spawned - Stack researcher: found React dashboard libraries - Feature researcher: investigated dashboard patterns - Architecture researcher: discovered conventions - Pitfall researcher: identified common mistakes3. Output: PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md4. Roadmap: 3 phases with atomic task breakdownThe key difference: GSD doesn’t just plan. It creates artifacts, spawns research agents, and handles execution.
Side-by-Side Comparison
After a month of testing both, here’s what I measured:
| Aspect | grill-me | GSD ||---------------------|---------------------|--------------------------|| Scope | Single feature | Full project lifecycle || Style | Pure interview | Interview + Research + Execute || Output | Shared understanding| PROJECT.md, ROADMAP.md, commits || Execution | None | Parallel waves, fresh contexts || Git integration | Manual | Automatic atomic commits || Research | User provides | System spawns agents || Context handling | Single context | Fresh subagent contexts || Overhead | Minimal (5 sentences)| Significant (full system) || Learning curve | None | Moderate || Multi-runtime | Codex-focused | Claude Code, Codex, Cursor, etc. |When grill-me Wins
I use grill-me for these scenarios:
Architectural decisions. “Should this be a monolith or microservices?” grill-me forces me to think through the trade-offs one question at a time.
Complex single features. “Design the authentication system.” The interview surfaces hidden requirements I didn’t know I had.
Refactoring strategy. “How do I untangle this module?” grill-me walks the dependency tree with me.
Bug investigation. “This bug is weird, help me think through it.” The interview process often reveals the root cause.
The Reddit poster was right: “grill-me is great for larger/nuanced features; I wouldn’t use it for everything.”
When GSD Wins
I use GSD for these scenarios:
Building complete applications. “Create a user management system.” GSD handles the full lifecycle from requirements to shipped code.
Multi-phase work. “Phase 1: Auth, Phase 2: Dashboard, Phase 3: API.” GSD creates ROADMAP.md with phase breakdown and executes each phase systematically.
Projects needing research. “Build a React dashboard with best practices.” GSD spawns research agents to find libraries, patterns, and pitfalls.
Clean git history required. Each task gets an atomic commit. No massive commits mixing unrelated changes.
The execution phase is where GSD shines:
Wave 1: [Plan A] [Plan B] [Plan C] (parallel, fresh 200k contexts) ↓Wave 2: [Plan D] [Plan E] (parallel, fresh contexts) ↓Each plan: atomic commit, clean historyGSD solves “context rot” - the quality degradation that happens when your context window fills with old decisions and irrelevant details. Each execution wave starts with a fresh context.
The Combined Workflow
I found a pattern that works better than either skill alone:
1. Start with grill-me for core architectural decisions → Reach shared understanding on design choices → Lock in those decisions mentally
2. Switch to GSD for execution → /gsd:new-project with decisions already made → Faster intake interview because answers are ready → Execute with locked-in understandingThis combination gives me the clarity from grill-me’s interview process and the execution power from GSD’s workflow system.
Installation
grill-me:
# Create SKILL.md manually with the 5-sentence prompt# Or if available as package:npx skills add grill-meGSD:
# For Codex specificallynpx get-shit-done-cc --codex --global
# Verify installation$gsd-helpWhat Still Frustrates Me
grill-me:
- No execution phase. I still have to code or prompt Codex to code after planning.
- No artifacts. The shared understanding stays in the conversation, not in a file I can reference later.
GSD:
- Overhead for simple tasks. Using GSD for a single function refactor feels like using a bulldozer for gardening.
- Learning curve. The full workflow takes time to understand. I still forget commands sometimes.
My Decision Matrix
┌─────────────────────┐ │ What are you doing? │ └─────────────────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ ▼ ▼ ▼ Single decision Multi-phase Complete app (1 complex choice) (2+ phases) (new project) │ │ │ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ grill-me │ │ GSD │ │ GSD │ └───────────┘ └───────────┘ └───────────┘ │ │ │ ▼ ▼ ▼ Interview only discuss→plan new-project →execute →full cycle →verifySummary
In this post, I compared grill-me and GSD skills for Codex planning. The key insight: grill-me solves the “AI makes wrong assumptions” problem for single complex decisions. GSD solves the “context rot” problem for multi-phase projects.
grill-me:
- Tiny, zero-overhead (5 sentences)
- Perfect for architectural decisions and complex single features
- Leaves execution to you
GSD:
- Complete lifecycle management
- Perfect for building applications and multi-phase work
- Handles execution with parallel agents and atomic commits
The Reddit insight captures it: “grill-me is great for larger/nuanced features; wouldn’t use it for everything.” Use grill-me when you need clarity on one thing. Use GSD when you need to build many things.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 GSD GitHub Repository
- 👨💻 GSD npm Package
- 👨💻 Reddit Discussion: grill-me skill experience
- 👨💻 GSD Discord Community
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments