Does Using grill-me Skill in Codex Waste Tokens? The Answer Surprised Me
Problem
When I first saw the grill-me skill for Codex, I had one immediate concern: “Does this waste tokens?”
The skill asks you dozens of questions before generating any code. Each question uses tokens. Each answer uses tokens. For complex features, I might answer 40, 60, even 80 questions. That sounds expensive.
I posted my concern on Reddit:
"Does this potentially use up more requests/tokens rather than doing the first approach?"The first approach being: let Codex generate a plan immediately, then iterate through revision cycles. My intuition was that extended Q&A must cost more than fast generation.
The answers surprised me.
The Hidden Cost of Fast Generation
Before I explain why grill-me saves tokens, let me show what the “fast” approach actually costs.
When Codex generates immediately without clarification, I get this pattern:
Round 1: Codex makes assumptions - Generates 500 lines with wrong architecture - I review, find 3 major issues - 3 back-and-forth exchanges to clarify - Tokens: ~5,000 generation + ~3,000 chat
Round 2: Codex revises with one wrong assumption remaining - Generates another 500 lines - I find 1 issue - 2 more back-and-forths - Tokens: ~4,500 generation + ~2,000 chat
Round 3: Codex finally generates correct code - 500 lines, correct this time - Tokens: ~5,000 generation
Total: ~19,500 tokens, 5 revision cycles, 45+ minutesEach wrong generation is expensive. The chat tokens during revisions are informational, but the generation tokens are pure waste — producing wrong code that gets discarded.
What grill-me Actually Costs
The grill-me skill itself is tiny. Here’s what it does:
1. Read the task request2. Generate questions to understand requirements3. Ask questions one at a time4. User answers each question5. After sufficient understanding, generate correct codeThe skill invocation itself uses “almost no tokens” as one Reddit commenter noted. It’s a 5-sentence prompt that tells Codex to interview you.
Let me calculate a typical grill-me session:
grill-me session for complex feature:
Skill invocation: ~100 tokens
40 questions asked: - Each question: ~50 tokens - Total: 2,000 tokens
40 answers from me: - Each answer: ~100 tokens (brief responses) - Total: 4,000 tokens
One correct generation: - 500 lines generated once - Tokens: ~5,000
Total: ~11,100 tokens, zero revision cycles, 20 minutesCompare that to the revision-heavy approach: ~11,100 tokens vs ~19,500 tokens. grill-me saves ~45% of tokens.
But wait — what if I answer 80 questions?
The 80-Question Session
One Reddit commenter shared this:
"Worth mentioning you can end up answering 80 questions as I did earlier this week"80 questions sounds extreme. But even at 80 questions, the math still favors grill-me:
Skill invocation: ~100 tokens
80 questions: ~4,000 tokens
80 answers: ~8,000 tokens
One correct generation: ~5,000 tokens
Total: ~17,100 tokensStill cheaper than ~19,500 tokens for revision cycles. And the user confirmed they still saved overall.
Why Q&A Tokens Are Different
The key insight from Reddit:
"Running skill itself uses almost no tokens, then you end up with a better stateafter a coding run, meaning less back and forth."Q&A tokens build context. Revision tokens rebuild wrong code.
When I answer a grill-me question like “Should this be synchronous or asynchronous?”, that answer becomes useful context for the final generation. It’s not waste — it’s investment in correct output.
When Codex generates wrong code and I say “No, use async”, the generation tokens that produced the synchronous version are waste. They produced nothing useful.
Q&A tokens: Building context → leads to correct codeRevision tokens: Generating wrong code → discardedWhen grill-me Might Cost More
grill-me isn’t for everything. If I use it for trivial tasks, I waste tokens:
Skip grill-me for:- Simple bug fixes (one-line changes)- Small refactoring (rename variable, extract function)- Routine updates (bump version, update config)
Use grill-me for:- Complex architectural decisions- Features with many edge cases- Work where wrong assumptions = expensive rewritesOne commenter confirmed this:
"I wouldn't use it for everything but for larger or more nuanced features, it's been great"If I answer 80 questions for a one-line fix, grill-me would cost more. But that’s a misuse of the tool — trivial tasks don’t need interviews.
Token Optimization Tips
To maximize savings with grill-me:
Give brief answers: - "Yes, async" instead of explaining async benefits - Let Codex explore the codebase instead of you describing files
Accept recommendations when you agree: - Don't debate every suggestion - If Codex proposes a reasonable approach, accept it
Know when to stop: - If questions become repetitive, say "Proceed with current understanding" - grill-me will then generate codeThe goal is efficient context-building, not exhaustive documentation.
Real Session Tracking
If your AI tool shows token counts, you can verify this yourself:
Session 1: Direct generation for complex feature - Tokens: 24,500 - Revision cycles: 3 - Time: 45 minutes with interruptions
Session 2: grill-me for similar complex feature - Tokens: 12,000 - Questions answered: 35 - Time: 20 minutes focused Q&A + 5 minutes code reviewI’ve seen consistent savings in my own usage. The more complex the feature, the bigger the savings.
The Reason
grill-me saves tokens because it invests upfront in context quality, not wasted output.
The intuition many users have — “Q&A must cost more tokens” — is wrong because it ignores hidden revision costs. Each wrong generation is expensive. Each Q&A exchange builds understanding.
The Reddit consensus:
"Possibly but you probably save more than you waste in the end"The key is selective use. Don’t interview for a one-line fix. Do interview for architectural decisions where wrong assumptions cascade into expensive rewrites.
Summary
In this post, I showed why grill-me skill saves tokens despite lengthy Q&A sessions. The skill itself costs almost nothing, and the Q&A tokens build useful context. Revision tokens generate wrong code repeatedly.
Even 80-question sessions can save tokens compared to revision-heavy workflows. The savings grow with feature complexity.
The takeaway: use grill-me for complex work, skip it for trivial tasks. The token math consistently favors upfront context-building over downstream corrections.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments