Skip to content

Does Using grill-me Skill in Codex Waste Tokens? The Answer Surprised Me

Problem

When I first saw the grill-me skill for Codex, I had one immediate concern: “Does this waste tokens?”

The skill asks you dozens of questions before generating any code. Each question uses tokens. Each answer uses tokens. For complex features, I might answer 40, 60, even 80 questions. That sounds expensive.

I posted my concern on Reddit:

My Reddit question
"Does this potentially use up more requests/tokens rather than doing the first approach?"

The first approach being: let Codex generate a plan immediately, then iterate through revision cycles. My intuition was that extended Q&A must cost more than fast generation.

The answers surprised me.

The Hidden Cost of Fast Generation

Before I explain why grill-me saves tokens, let me show what the “fast” approach actually costs.

When Codex generates immediately without clarification, I get this pattern:

Revision cycle costs
Round 1: Codex makes assumptions
- Generates 500 lines with wrong architecture
- I review, find 3 major issues
- 3 back-and-forth exchanges to clarify
- Tokens: ~5,000 generation + ~3,000 chat
Round 2: Codex revises with one wrong assumption remaining
- Generates another 500 lines
- I find 1 issue
- 2 more back-and-forths
- Tokens: ~4,500 generation + ~2,000 chat
Round 3: Codex finally generates correct code
- 500 lines, correct this time
- Tokens: ~5,000 generation
Total: ~19,500 tokens, 5 revision cycles, 45+ minutes

Each wrong generation is expensive. The chat tokens during revisions are informational, but the generation tokens are pure waste — producing wrong code that gets discarded.

What grill-me Actually Costs

The grill-me skill itself is tiny. Here’s what it does:

grill-me skill structure
1. Read the task request
2. Generate questions to understand requirements
3. Ask questions one at a time
4. User answers each question
5. After sufficient understanding, generate correct code

The skill invocation itself uses “almost no tokens” as one Reddit commenter noted. It’s a 5-sentence prompt that tells Codex to interview you.

Let me calculate a typical grill-me session:

grill-me token breakdown
grill-me session for complex feature:
Skill invocation: ~100 tokens
40 questions asked:
- Each question: ~50 tokens
- Total: 2,000 tokens
40 answers from me:
- Each answer: ~100 tokens (brief responses)
- Total: 4,000 tokens
One correct generation:
- 500 lines generated once
- Tokens: ~5,000
Total: ~11,100 tokens, zero revision cycles, 20 minutes

Compare that to the revision-heavy approach: ~11,100 tokens vs ~19,500 tokens. grill-me saves ~45% of tokens.

But wait — what if I answer 80 questions?

The 80-Question Session

One Reddit commenter shared this:

Reddit evidence on question count
"Worth mentioning you can end up answering 80 questions as I did earlier this week"

80 questions sounds extreme. But even at 80 questions, the math still favors grill-me:

80-question token calculation
Skill invocation: ~100 tokens
80 questions: ~4,000 tokens
80 answers: ~8,000 tokens
One correct generation: ~5,000 tokens
Total: ~17,100 tokens

Still cheaper than ~19,500 tokens for revision cycles. And the user confirmed they still saved overall.

Why Q&A Tokens Are Different

The key insight from Reddit:

Reddit commenter explanation
"Running skill itself uses almost no tokens, then you end up with a better state
after a coding run, meaning less back and forth."

Q&A tokens build context. Revision tokens rebuild wrong code.

When I answer a grill-me question like “Should this be synchronous or asynchronous?”, that answer becomes useful context for the final generation. It’s not waste — it’s investment in correct output.

When Codex generates wrong code and I say “No, use async”, the generation tokens that produced the synchronous version are waste. They produced nothing useful.

Token quality comparison
Q&A tokens: Building context → leads to correct code
Revision tokens: Generating wrong code → discarded

When grill-me Might Cost More

grill-me isn’t for everything. If I use it for trivial tasks, I waste tokens:

When to skip grill-me
Skip grill-me for:
- Simple bug fixes (one-line changes)
- Small refactoring (rename variable, extract function)
- Routine updates (bump version, update config)
Use grill-me for:
- Complex architectural decisions
- Features with many edge cases
- Work where wrong assumptions = expensive rewrites

One commenter confirmed this:

Reddit usage guidance
"I wouldn't use it for everything but for larger or more nuanced features, it's been great"

If I answer 80 questions for a one-line fix, grill-me would cost more. But that’s a misuse of the tool — trivial tasks don’t need interviews.

Token Optimization Tips

To maximize savings with grill-me:

Efficient Q&A practices
Give brief answers:
- "Yes, async" instead of explaining async benefits
- Let Codex explore the codebase instead of you describing files
Accept recommendations when you agree:
- Don't debate every suggestion
- If Codex proposes a reasonable approach, accept it
Know when to stop:
- If questions become repetitive, say "Proceed with current understanding"
- grill-me will then generate code

The goal is efficient context-building, not exhaustive documentation.

Real Session Tracking

If your AI tool shows token counts, you can verify this yourself:

Monitoring your own sessions
Session 1: Direct generation for complex feature
- Tokens: 24,500
- Revision cycles: 3
- Time: 45 minutes with interruptions
Session 2: grill-me for similar complex feature
- Tokens: 12,000
- Questions answered: 35
- Time: 20 minutes focused Q&A + 5 minutes code review

I’ve seen consistent savings in my own usage. The more complex the feature, the bigger the savings.

The Reason

grill-me saves tokens because it invests upfront in context quality, not wasted output.

The intuition many users have — “Q&A must cost more tokens” — is wrong because it ignores hidden revision costs. Each wrong generation is expensive. Each Q&A exchange builds understanding.

The Reddit consensus:

Reddit summary
"Possibly but you probably save more than you waste in the end"

The key is selective use. Don’t interview for a one-line fix. Do interview for architectural decisions where wrong assumptions cascade into expensive rewrites.

Summary

In this post, I showed why grill-me skill saves tokens despite lengthy Q&A sessions. The skill itself costs almost nothing, and the Q&A tokens build useful context. Revision tokens generate wrong code repeatedly.

Even 80-question sessions can save tokens compared to revision-heavy workflows. The savings grow with feature complexity.

The takeaway: use grill-me for complex work, skip it for trivial tasks. The token math consistently favors upfront context-building over downstream corrections.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments