grill-me vs GSD Skill for Codex: Which Planning Approach Works Better

Apr 1, 2026

I kept getting half-baked implementations from Codex. The AI would make assumptions about my requirements, code around problems I didn’t have, and deliver features that needed three revision cycles. Then I found two different planning skills: grill-me and GSD. After testing both for a month, I realized they solve different problems.

The Quick Answer

Use grill-me for single complex features. Use GSD for full projects or multi-phase work.

[What are you building?]
  |-- Single complex feature --> grill-me (interview-style planning)
  |-- Multi-phase project --> GSD (full lifecycle management)
  |-- Complete application --> GSD (research + roadmap + execution)

The key difference isn’t quality. Both work well. The difference is scope.

My Problem with Assumption-Based Planning

Before I found these skills, my Codex workflow looked like this:

Describe feature vaguely
Codex makes assumptions
I get code that doesn’t match my mental model
Revision cycle begins
Repeat until frustrated

A Reddit post titled “I tried the grill-me skill and it completely changed how I plan with Codex” caught my attention. The poster claimed a 90% success rate improvement. I tried it.

Then someone in the comments mentioned GSD. I tried that too.

How grill-me Works

The grill-me skill is five sentences. That’s it. Here’s the entire prompt:

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree... Ask the questions one at a time. If a question can be answered by exploring the codebase, explore the codebase instead.

What happens when you use it:

Codex switches from “generate mode” to “interview mode”
It asks one question at a time (not a question dump)
It explores your codebase to answer questions when possible
It stops when we reach shared understanding

Here’s what a session looks like:

Me: Use grill-me to help me decide on the database architecture

Codex: Let me explore your codebase first...
[explores existing data models, query patterns]

Based on your current setup, what's your expected data volume?

Me: About 10k records per day

Codex: With that volume, PostgreSQL makes sense over MongoDB because your query patterns show relational joins. What's your read/write ratio?

Me: 80% reads, 20% writes

Codex: That suggests read replicas would help. Do you need real-time consistency across replicas, or is eventual consistency acceptable?

The interview continues until Codex understands my actual requirements, not its assumptions about them.

How GSD Works

GSD (Get Shit Done) is a complete workflow system. It’s not a single prompt. It’s a meta-prompting, context engineering, and spec-driven development system with multiple commands:

/gsd:new-project    → Questions → Research → Requirements → Roadmap
/gsd:discuss-phase  → Capture implementation decisions
/gsd:plan-phase     → Research + Plan + Verify
/gsd:execute-phase  → Parallel execution in fresh contexts
/gsd:verify-work    → User acceptance testing
/gsd:ship           → Create PR

When I ran /gsd:new-project for a dashboard application:

1. Interview phase: 15 questions about requirements
2. Research phase: 4 parallel agents spawned
   - Stack researcher: found React dashboard libraries
   - Feature researcher: investigated dashboard patterns
   - Architecture researcher: discovered conventions
   - Pitfall researcher: identified common mistakes
3. Output: PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md
4. Roadmap: 3 phases with atomic task breakdown

The key difference: GSD doesn’t just plan. It creates artifacts, spawns research agents, and handles execution.

Side-by-Side Comparison

After a month of testing both, here’s what I measured:

| Aspect              | grill-me            | GSD                      |
|---------------------|---------------------|--------------------------|
| Scope               | Single feature      | Full project lifecycle   |
| Style               | Pure interview      | Interview + Research + Execute |
| Output              | Shared understanding| PROJECT.md, ROADMAP.md, commits |
| Execution           | None                | Parallel waves, fresh contexts |
| Git integration     | Manual              | Automatic atomic commits |
| Research            | User provides       | System spawns agents     |
| Context handling    | Single context      | Fresh subagent contexts  |
| Overhead            | Minimal (5 sentences)| Significant (full system) |
| Learning curve      | None                | Moderate                 |
| Multi-runtime       | Codex-focused       | Claude Code, Codex, Cursor, etc. |

When grill-me Wins

I use grill-me for these scenarios:

Architectural decisions. “Should this be a monolith or microservices?” grill-me forces me to think through the trade-offs one question at a time.

Complex single features. “Design the authentication system.” The interview surfaces hidden requirements I didn’t know I had.

Refactoring strategy. “How do I untangle this module?” grill-me walks the dependency tree with me.

Bug investigation. “This bug is weird, help me think through it.” The interview process often reveals the root cause.

The Reddit poster was right: “grill-me is great for larger/nuanced features; I wouldn’t use it for everything.”

When GSD Wins

I use GSD for these scenarios:

Building complete applications. “Create a user management system.” GSD handles the full lifecycle from requirements to shipped code.

Multi-phase work. “Phase 1: Auth, Phase 2: Dashboard, Phase 3: API.” GSD creates ROADMAP.md with phase breakdown and executes each phase systematically.

Projects needing research. “Build a React dashboard with best practices.” GSD spawns research agents to find libraries, patterns, and pitfalls.

Clean git history required. Each task gets an atomic commit. No massive commits mixing unrelated changes.

The execution phase is where GSD shines:

Wave 1: [Plan A] [Plan B] [Plan C]  (parallel, fresh 200k contexts)
   ↓
Wave 2: [Plan D] [Plan E]           (parallel, fresh contexts)
   ↓
Each plan: atomic commit, clean history

GSD solves “context rot” - the quality degradation that happens when your context window fills with old decisions and irrelevant details. Each execution wave starts with a fresh context.

The Combined Workflow

I found a pattern that works better than either skill alone:

1. Start with grill-me for core architectural decisions
   → Reach shared understanding on design choices
   → Lock in those decisions mentally

2. Switch to GSD for execution
   → /gsd:new-project with decisions already made
   → Faster intake interview because answers are ready
   → Execute with locked-in understanding

This combination gives me the clarity from grill-me’s interview process and the execution power from GSD’s workflow system.

Installation

grill-me:

# Create SKILL.md manually with the 5-sentence prompt
# Or if available as package:
npx skills add grill-me

GSD:

# For Codex specifically
npx get-shit-done-cc --codex --global

# Verify installation
$gsd-help

What Still Frustrates Me

grill-me:

No execution phase. I still have to code or prompt Codex to code after planning.
No artifacts. The shared understanding stays in the conversation, not in a file I can reference later.

GSD:

Overhead for simple tasks. Using GSD for a single function refactor feels like using a bulldozer for gardening.
Learning curve. The full workflow takes time to understand. I still forget commands sometimes.

My Decision Matrix

                    ┌─────────────────────┐
                    │ What are you doing? │
                    └─────────────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
   Single decision      Multi-phase        Complete app
   (1 complex choice)   (2+ phases)        (new project)
         │                   │                   │
         ▼                   ▼                   ▼
   ┌───────────┐       ┌───────────┐       ┌───────────┐
   │ grill-me  │       │    GSD    │       │    GSD    │
   └───────────┘       └───────────┘       └───────────┘
         │                   │                   │
         ▼                   ▼                   ▼
   Interview only      discuss→plan        new-project
                       →execute            →full cycle
                       →verify

Summary

In this post, I compared grill-me and GSD skills for Codex planning. The key insight: grill-me solves the “AI makes wrong assumptions” problem for single complex decisions. GSD solves the “context rot” problem for multi-phase projects.

grill-me:

Tiny, zero-overhead (5 sentences)
Perfect for architectural decisions and complex single features
Leaves execution to you

GSD:

Complete lifecycle management
Perfect for building applications and multi-phase work
Handles execution with parallel agents and atomic commits

The Reddit insight captures it: “grill-me is great for larger/nuanced features; wouldn’t use it for everything.” Use grill-me when you need clarity on one thing. Use GSD when you need to build many things.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 GSD GitHub Repository
👨‍💻 GSD npm Package
👨‍💻 Reddit Discussion: grill-me skill experience
👨‍💻 GSD Discord Community

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

grill-me vs GSD Skill for Codex: Which Planning Approach Works Better

The Quick Answer

My Problem with Assumption-Based Planning

How grill-me Works

How GSD Works

Side-by-Side Comparison

When grill-me Wins

When GSD Wins

The Combined Workflow

Installation

What Still Frustrates Me

My Decision Matrix

Summary

Final Words + More Resources

Comments