How to Build Custom Claude Code Skills with Role-Based Prompting
I kept repeating the same instructions to Claude Code: “Write tests first, check coverage, use this naming pattern, review the code.” Every session felt like I was retraining Claude from scratch.
Then I discovered skills. A skill is a reusable instruction package that Claude loads automatically when triggered. But the real insight came from a Reddit thread about gstack: the key pattern isn’t just having instructions - it’s giving AI agents distinct roles in separate context windows.
The Problem: Context Repetition
Every time I started a new coding session, I had to:
- Explain my testing requirements (TDD, 80% coverage)
- Describe my naming conventions
- Outline my review process
- Specify my security checklist
This wasted tokens and time. Worse, Claude sometimes forgot mid-session and I had to remind it again.
I tried putting instructions in a README file, but Claude wouldn’t consistently read it. I tried adding comments to my code, but that only helped for specific files.
The Solution: SKILL.md Files
Skills solve this problem with a simple structure:
skill-name/├── SKILL.md (required)│ ├── YAML frontmatter (name + description)│ └── Markdown instructions├── scripts/ (optional - executable code)├── references/ (optional - documentation)└── assets/ (optional - templates, files)The critical part is the YAML frontmatter:
---name: tdd-workflowdescription: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.---I initially made a mistake with the description field. I wrote:
description: Test-driven development workflow with 80% coverage requirements.This was too vague. Claude didn’t know when to use the skill. The description needs two things: what the skill does AND when to trigger it.
description: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.Now Claude automatically loads this skill whenever I mention “new feature” or “fix bug”.
Progressive Disclosure: Three Loading Levels
Skills use a three-level loading system that I didn’t understand at first:
Level 1: Metadata (name + description) → Always in context (~100 words) → Claude sees this for every skill
Level 2: SKILL.md body → Loaded when skill triggers (<5k words) → Full instructions, patterns, examples
Level 3: Bundled resources → Loaded as needed (scripts, references) → Unlimited because scripts execute without contextThis means the description field is the trigger. I wasted time writing detailed “When to Use” sections in the SKILL.md body, but Claude never saw them because the body only loads after triggering.
Role-Based Prompting: The gstack Insight
A Reddit thread about gstack revealed the deeper pattern. The commenter noted:
“The insight that you get better output by separating planning, review, and QA into distinct context windows is valuable.”
This isn’t about one skill doing everything. It’s about multiple skills with distinct roles:
┌─────────────┐│ Planner │ → Think → Plan│ Skill │ (separate context)└─────────────┘ │ ↓ (output passed to next)┌─────────────┐│ Builder │ → Build → Code│ Skill │ (fresh context)└─────────────┘ │ ↓ (output passed to next)┌─────────────┐│ Reviewer │ → Review → QA│ Skill │ (fresh context)└─────────────┘The gstack approach used roles like CEO, Engineering Manager, Designer, Reviewer & QA Lead, Security Officer. Each role operates in a clean context window, preventing the confusion that happens when one agent tries to do everything.
I adapted this for my workflow:
| Role | Skill | Trigger |
|---|---|---|
| Planner | planning-with-files | ”plan implementation” |
| Builder | tdd-workflow | ”new feature” or “fix bug” |
| Reviewer | code-reviewer | ”review code” |
| Security | security-review | ”security check” |
My First Skill: TDD Workflow
Here’s the skill I built first:
---name: tdd-workflowdescription: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.---
# Test-Driven Development Workflow
## Core Principles
### 1. Tests BEFORE CodeALWAYS write tests first, then implement code to make tests pass.
### 2. Coverage Requirements- Minimum 80% coverage (unit + integration + E2E)- All edge cases covered- Error scenarios tested
### 3. Test Types
#### Unit Tests- Individual functions and utilities- Pure functions and helpers
#### Integration Tests- API endpoints- Database operations
#### E2E Tests- Critical user flows- Complete workflows
## TDD Workflow Steps
1. Write user journey: "As a [role], I want to [action]"2. Generate test cases for each journey3. Run tests (they should fail)4. Implement minimal code5. Run tests (they should pass)6. Refactor7. Verify 80%+ coverageThe key insight: I put ONLY the essential workflow in SKILL.md. Detailed patterns and examples went in a separate references file.
Separating Content: The Reference Pattern
I initially wrote a 400-line SKILL.md with every testing pattern I knew. This violated the “keep SKILL.md under 500 lines” guideline and bloated the context window.
The solution was splitting content:
tdd-workflow/├── SKILL.md (core workflow only - 80 lines)├── references/│ ├── unit-test-patterns.md (detailed examples)│ ├── integration-patterns.md (API testing patterns)│ └── e2e-patterns.md (Playwright examples)SKILL.md now references these files:
## Testing Patterns
- **Unit Tests**: See [unit-test-patterns.md](references/unit-test-patterns.md)- **Integration Tests**: See [integration-patterns.md](references/integration-patterns.md)- **E2E Tests**: See [e2e-patterns.md](references/e2e-patterns.md)Claude loads reference files only when needed, saving context for other tasks.
When to Use Scripts vs. Text Instructions
I struggled with deciding when to write a script versus when to write text instructions.
The guideline is:
| Freedom Level | Format | When to Use |
|---|---|---|
| High freedom | Text instructions | Multiple approaches valid, context-dependent decisions |
| Medium freedom | Pseudocode with parameters | Preferred pattern exists, some variation OK |
| Low freedom | Scripts with few parameters | Operations fragile, consistency critical |
Example:
Task: Run tests and check coverage→ High freedom: "Run tests, verify 80% coverage"→ Text instruction works
Task: Rotate PDF pages→ Low freedom: Must work exactly the same every time→ Script: scripts/rotate_pdf.py
Task: Create new API endpoint→ Medium freedom: Pattern exists, details vary→ Pseudocode template with parametersMy Skill Creation Process
I followed this workflow after reading the skill-creator documentation:
Step 1: Understand with Concrete Examples
I listed specific scenarios where I wanted Claude to behave consistently:
- “When I say ‘add feature’, write tests first”
- “When I say ‘fix bug’, write test for bug, then fix”
- “When I say ‘refactor’, ensure tests still pass”
Step 2: Plan Reusable Contents
For each scenario, I asked: “What would Claude need every time?”
- Test patterns → references/
- Coverage commands → SKILL.md
- Mock templates → references/
Step 3: Initialize with Script
I used the init script:
scripts/init_skill.py tdd-workflow --path ~/.claude/skills/This created the directory structure with placeholder files.
Step 4: Edit SKILL.md
I wrote the frontmatter description first (the trigger), then the body instructions.
Step 5: Package the Skill
scripts/package_skill.py ~/.claude/skills/tdd-workflowThis validates the skill and creates a distributable .skill file.
Step 6: Iterate Based on Usage
After using the skill, I noticed Claude wasn’t always running tests first. I added this to SKILL.md:
### 1. Tests BEFORE CodeALWAYS write tests first. If Claude starts implementing before tests, STOP and write tests.The word “ALWAYS” and “STOP” made the instruction more enforceable.
Common Mistakes I Made
Mistake 1: Description too abstract
description: Testing workflow for quality code.Claude never triggered this. I fixed it with concrete trigger phrases.
Mistake 2: “When to Use” in body instead of description
I wrote a “When to Use This Skill” section in SKILL.md body. But the body only loads after the skill triggers, so Claude never saw these instructions.
Mistake 3: All content in one file
My 400-line SKILL.md bloated context. I split it into references.
Mistake 4: No enforcement language
I wrote “Write tests first” and Claude sometimes ignored it. I changed to “ALWAYS write tests first. STOP if implementing before tests.”
Mistake 5: Wrong scope for resources
I put project-specific schemas in a user-scope skill. Now I use project-scope for project-specific content.
Sprint Process: The Complete Workflow
The gstack thread described a “Sprint-as-a-process” framework:
Think → Plan → Build → Review → Test → Ship → ReflectI translated this into skill triggers:
1. Think: /think-mode (built-in)2. Plan: "plan implementation" → planning skill3. Build: "new feature" → tdd-workflow skill4. Review: "review code" → code-reviewer skill5. Test: Built into tdd-workflow6. Ship: "commit" → git workflow7. Reflect: "what did we learn" → summaryEach phase has a fresh context window, preventing the confusion that happens when one agent does everything.
What Changed After Building Skills
Before skills:
- Repeated instructions every session
- Claude forgot preferences mid-session
- Inconsistent test coverage
- Manual review checklist enforcement
After skills:
- Claude loads TDD workflow automatically
- Coverage stays above 80%
- Review happens after every feature
- Security checks are consistent
Summary
Building custom Claude Code skills involves:
- SKILL.md with proper frontmatter: Name and description (description is the trigger)
- Role-based prompting: Multiple skills with distinct roles, not one skill doing everything
- Progressive disclosure: Metadata always visible, body when triggered, references as needed
- Split content: Core workflow in SKILL.md, details in references
- Appropriate freedom level: Text for high freedom, scripts for low freedom
The most transferable lesson from the gstack discussion isn’t the tool itself - it’s the pattern: give AI agents distinct roles, structured processes, and clear boundaries.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Claude Code Skill Creator Documentation
- 👨💻 Reddit Discussion: gstack Role-Based Prompting
- 👨💻 Model Context Protocol Specification
- 👨💻 Anthropic Claude Code Guide
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments