Does Using grill-me Skill in Codex Waste Tokens? The Answer Surprised Me

Apr 1, 2026

Problem

When I first saw the grill-me skill for Codex, I had one immediate concern: “Does this waste tokens?”

The skill asks you dozens of questions before generating any code. Each question uses tokens. Each answer uses tokens. For complex features, I might answer 40, 60, even 80 questions. That sounds expensive.

I posted my concern on Reddit:

"Does this potentially use up more requests/tokens rather than doing the first approach?"

The first approach being: let Codex generate a plan immediately, then iterate through revision cycles. My intuition was that extended Q&A must cost more than fast generation.

The answers surprised me.

The Hidden Cost of Fast Generation

Before I explain why grill-me saves tokens, let me show what the “fast” approach actually costs.

When Codex generates immediately without clarification, I get this pattern:

Round 1: Codex makes assumptions
  - Generates 500 lines with wrong architecture
  - I review, find 3 major issues
  - 3 back-and-forth exchanges to clarify
  - Tokens: ~5,000 generation + ~3,000 chat

Round 2: Codex revises with one wrong assumption remaining
  - Generates another 500 lines
  - I find 1 issue
  - 2 more back-and-forths
  - Tokens: ~4,500 generation + ~2,000 chat

Round 3: Codex finally generates correct code
  - 500 lines, correct this time
  - Tokens: ~5,000 generation

Total: ~19,500 tokens, 5 revision cycles, 45+ minutes

Each wrong generation is expensive. The chat tokens during revisions are informational, but the generation tokens are pure waste — producing wrong code that gets discarded.

What grill-me Actually Costs

The grill-me skill itself is tiny. Here’s what it does:

1. Read the task request
2. Generate questions to understand requirements
3. Ask questions one at a time
4. User answers each question
5. After sufficient understanding, generate correct code

The skill invocation itself uses “almost no tokens” as one Reddit commenter noted. It’s a 5-sentence prompt that tells Codex to interview you.

Let me calculate a typical grill-me session:

grill-me session for complex feature:

Skill invocation: ~100 tokens

40 questions asked:
  - Each question: ~50 tokens
  - Total: 2,000 tokens

40 answers from me:
  - Each answer: ~100 tokens (brief responses)
  - Total: 4,000 tokens

One correct generation:
  - 500 lines generated once
  - Tokens: ~5,000

Total: ~11,100 tokens, zero revision cycles, 20 minutes

Compare that to the revision-heavy approach: ~11,100 tokens vs ~19,500 tokens. grill-me saves ~45% of tokens.

But wait — what if I answer 80 questions?

The 80-Question Session

One Reddit commenter shared this:

"Worth mentioning you can end up answering 80 questions as I did earlier this week"

80 questions sounds extreme. But even at 80 questions, the math still favors grill-me:

Skill invocation: ~100 tokens

80 questions: ~4,000 tokens

80 answers: ~8,000 tokens

One correct generation: ~5,000 tokens

Total: ~17,100 tokens

Still cheaper than ~19,500 tokens for revision cycles. And the user confirmed they still saved overall.

Why Q&A Tokens Are Different

The key insight from Reddit:

"Running skill itself uses almost no tokens, then you end up with a better state
after a coding run, meaning less back and forth."

Q&A tokens build context. Revision tokens rebuild wrong code.

When I answer a grill-me question like “Should this be synchronous or asynchronous?”, that answer becomes useful context for the final generation. It’s not waste — it’s investment in correct output.

When Codex generates wrong code and I say “No, use async”, the generation tokens that produced the synchronous version are waste. They produced nothing useful.

Q&A tokens:      Building context → leads to correct code
Revision tokens: Generating wrong code → discarded

When grill-me Might Cost More

grill-me isn’t for everything. If I use it for trivial tasks, I waste tokens:

Skip grill-me for:
- Simple bug fixes (one-line changes)
- Small refactoring (rename variable, extract function)
- Routine updates (bump version, update config)

Use grill-me for:
- Complex architectural decisions
- Features with many edge cases
- Work where wrong assumptions = expensive rewrites

One commenter confirmed this:

"I wouldn't use it for everything but for larger or more nuanced features, it's been great"

If I answer 80 questions for a one-line fix, grill-me would cost more. But that’s a misuse of the tool — trivial tasks don’t need interviews.

Token Optimization Tips

To maximize savings with grill-me:

Give brief answers:
  - "Yes, async" instead of explaining async benefits
  - Let Codex explore the codebase instead of you describing files

Accept recommendations when you agree:
  - Don't debate every suggestion
  - If Codex proposes a reasonable approach, accept it

Know when to stop:
  - If questions become repetitive, say "Proceed with current understanding"
  - grill-me will then generate code

The goal is efficient context-building, not exhaustive documentation.

Real Session Tracking

If your AI tool shows token counts, you can verify this yourself:

Session 1: Direct generation for complex feature
  - Tokens: 24,500
  - Revision cycles: 3
  - Time: 45 minutes with interruptions

Session 2: grill-me for similar complex feature
  - Tokens: 12,000
  - Questions answered: 35
  - Time: 20 minutes focused Q&A + 5 minutes code review

I’ve seen consistent savings in my own usage. The more complex the feature, the bigger the savings.

The Reason

grill-me saves tokens because it invests upfront in context quality, not wasted output.

The intuition many users have — “Q&A must cost more tokens” — is wrong because it ignores hidden revision costs. Each wrong generation is expensive. Each Q&A exchange builds understanding.

The Reddit consensus:

"Possibly but you probably save more than you waste in the end"

The key is selective use. Don’t interview for a one-line fix. Do interview for architectural decisions where wrong assumptions cascade into expensive rewrites.

Summary

In this post, I showed why grill-me skill saves tokens despite lengthy Q&A sessions. The skill itself costs almost nothing, and the Q&A tokens build useful context. Revision tokens generate wrong code repeatedly.

Even 80-question sessions can save tokens compared to revision-heavy workflows. The savings grow with feature complexity.

The takeaway: use grill-me for complex work, skip it for trivial tasks. The token math consistently favors upfront context-building over downstream corrections.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: grill-me skill token usage
👨‍💻 grill-me skill on skills.sh

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!