How to Build Custom Claude Code Skills with Role-Based Prompting

Mar 31, 2026

I kept repeating the same instructions to Claude Code: “Write tests first, check coverage, use this naming pattern, review the code.” Every session felt like I was retraining Claude from scratch.

Then I discovered skills. A skill is a reusable instruction package that Claude loads automatically when triggered. But the real insight came from a Reddit thread about gstack: the key pattern isn’t just having instructions - it’s giving AI agents distinct roles in separate context windows.

The Problem: Context Repetition

Every time I started a new coding session, I had to:

Explain my testing requirements (TDD, 80% coverage)
Describe my naming conventions
Outline my review process
Specify my security checklist

This wasted tokens and time. Worse, Claude sometimes forgot mid-session and I had to remind it again.

I tried putting instructions in a README file, but Claude wouldn’t consistently read it. I tried adding comments to my code, but that only helped for specific files.

The Solution: SKILL.md Files

Skills solve this problem with a simple structure:

skill-name/
├── SKILL.md (required)
│   ├── YAML frontmatter (name + description)
│   └── Markdown instructions
├── scripts/ (optional - executable code)
├── references/ (optional - documentation)
└── assets/ (optional - templates, files)

The critical part is the YAML frontmatter:

---
name: tdd-workflow
description: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.
---

I initially made a mistake with the description field. I wrote:

description: Test-driven development workflow with 80% coverage requirements.

This was too vague. Claude didn’t know when to use the skill. The description needs two things: what the skill does AND when to trigger it.

description: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.

Now Claude automatically loads this skill whenever I mention “new feature” or “fix bug”.

Progressive Disclosure: Three Loading Levels

Skills use a three-level loading system that I didn’t understand at first:

Level 1: Metadata (name + description)
  → Always in context (~100 words)
  → Claude sees this for every skill

Level 2: SKILL.md body
  → Loaded when skill triggers (<5k words)
  → Full instructions, patterns, examples

Level 3: Bundled resources
  → Loaded as needed (scripts, references)
  → Unlimited because scripts execute without context

This means the description field is the trigger. I wasted time writing detailed “When to Use” sections in the SKILL.md body, but Claude never saw them because the body only loads after triggering.

Role-Based Prompting: The gstack Insight

A Reddit thread about gstack revealed the deeper pattern. The commenter noted:

“The insight that you get better output by separating planning, review, and QA into distinct context windows is valuable.”

This isn’t about one skill doing everything. It’s about multiple skills with distinct roles:

┌─────────────┐
│   Planner   │  → Think → Plan
│   Skill     │     (separate context)
└─────────────┘
       │
       ↓ (output passed to next)
┌─────────────┐
│   Builder   │  → Build → Code
│   Skill     │     (fresh context)
└─────────────┘
       │
       ↓ (output passed to next)
┌─────────────┐
│  Reviewer   │  → Review → QA
│   Skill     │     (fresh context)
└─────────────┘

The gstack approach used roles like CEO, Engineering Manager, Designer, Reviewer & QA Lead, Security Officer. Each role operates in a clean context window, preventing the confusion that happens when one agent tries to do everything.

I adapted this for my workflow:

Role	Skill	Trigger
Planner	`planning-with-files`	”plan implementation”
Builder	`tdd-workflow`	”new feature” or “fix bug”
Reviewer	`code-reviewer`	”review code”
Security	`security-review`	”security check”

My First Skill: TDD Workflow

Here’s the skill I built first:

---
name: tdd-workflow
description: Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.
---

# Test-Driven Development Workflow

## Core Principles

### 1. Tests BEFORE Code
ALWAYS write tests first, then implement code to make tests pass.

### 2. Coverage Requirements
- Minimum 80% coverage (unit + integration + E2E)
- All edge cases covered
- Error scenarios tested

### 3. Test Types

#### Unit Tests
- Individual functions and utilities
- Pure functions and helpers

#### Integration Tests
- API endpoints
- Database operations

#### E2E Tests
- Critical user flows
- Complete workflows

## TDD Workflow Steps

1. Write user journey: "As a [role], I want to [action]"
2. Generate test cases for each journey
3. Run tests (they should fail)
4. Implement minimal code
5. Run tests (they should pass)
6. Refactor
7. Verify 80%+ coverage

The key insight: I put ONLY the essential workflow in SKILL.md. Detailed patterns and examples went in a separate references file.

Separating Content: The Reference Pattern

I initially wrote a 400-line SKILL.md with every testing pattern I knew. This violated the “keep SKILL.md under 500 lines” guideline and bloated the context window.

The solution was splitting content:

tdd-workflow/
├── SKILL.md (core workflow only - 80 lines)
├── references/
│   ├── unit-test-patterns.md (detailed examples)
│   ├── integration-patterns.md (API testing patterns)
│   └── e2e-patterns.md (Playwright examples)

SKILL.md now references these files:

## Testing Patterns

- **Unit Tests**: See [unit-test-patterns.md](references/unit-test-patterns.md)
- **Integration Tests**: See [integration-patterns.md](references/integration-patterns.md)
- **E2E Tests**: See [e2e-patterns.md](references/e2e-patterns.md)

Claude loads reference files only when needed, saving context for other tasks.

When to Use Scripts vs. Text Instructions

I struggled with deciding when to write a script versus when to write text instructions.

The guideline is:

Freedom Level	Format	When to Use
High freedom	Text instructions	Multiple approaches valid, context-dependent decisions
Medium freedom	Pseudocode with parameters	Preferred pattern exists, some variation OK
Low freedom	Scripts with few parameters	Operations fragile, consistency critical

Example:

Task: Run tests and check coverage
→ High freedom: "Run tests, verify 80% coverage"
→ Text instruction works

Task: Rotate PDF pages
→ Low freedom: Must work exactly the same every time
→ Script: scripts/rotate_pdf.py

Task: Create new API endpoint
→ Medium freedom: Pattern exists, details vary
→ Pseudocode template with parameters

My Skill Creation Process

I followed this workflow after reading the skill-creator documentation:

Step 1: Understand with Concrete Examples

I listed specific scenarios where I wanted Claude to behave consistently:

“When I say ‘add feature’, write tests first”
“When I say ‘fix bug’, write test for bug, then fix”
“When I say ‘refactor’, ensure tests still pass”

Step 2: Plan Reusable Contents

For each scenario, I asked: “What would Claude need every time?”

Test patterns → references/
Coverage commands → SKILL.md
Mock templates → references/

Step 3: Initialize with Script

I used the init script:

scripts/init_skill.py tdd-workflow --path ~/.claude/skills/

This created the directory structure with placeholder files.

Step 4: Edit SKILL.md

I wrote the frontmatter description first (the trigger), then the body instructions.

Step 5: Package the Skill

scripts/package_skill.py ~/.claude/skills/tdd-workflow

This validates the skill and creates a distributable .skill file.

Step 6: Iterate Based on Usage

After using the skill, I noticed Claude wasn’t always running tests first. I added this to SKILL.md:

### 1. Tests BEFORE Code
ALWAYS write tests first. If Claude starts implementing before tests, STOP and write tests.

The word “ALWAYS” and “STOP” made the instruction more enforceable.

Common Mistakes I Made

Mistake 1: Description too abstract

description: Testing workflow for quality code.

Claude never triggered this. I fixed it with concrete trigger phrases.

Mistake 2: “When to Use” in body instead of description

I wrote a “When to Use This Skill” section in SKILL.md body. But the body only loads after the skill triggers, so Claude never saw these instructions.

Mistake 3: All content in one file

My 400-line SKILL.md bloated context. I split it into references.

Mistake 4: No enforcement language

I wrote “Write tests first” and Claude sometimes ignored it. I changed to “ALWAYS write tests first. STOP if implementing before tests.”

Mistake 5: Wrong scope for resources

I put project-specific schemas in a user-scope skill. Now I use project-scope for project-specific content.

Sprint Process: The Complete Workflow

The gstack thread described a “Sprint-as-a-process” framework:

Think → Plan → Build → Review → Test → Ship → Reflect

I translated this into skill triggers:

1. Think: /think-mode (built-in)
2. Plan: "plan implementation" → planning skill
3. Build: "new feature" → tdd-workflow skill
4. Review: "review code" → code-reviewer skill
5. Test: Built into tdd-workflow
6. Ship: "commit" → git workflow
7. Reflect: "what did we learn" → summary

Each phase has a fresh context window, preventing the confusion that happens when one agent does everything.

What Changed After Building Skills

Before skills:

Repeated instructions every session
Claude forgot preferences mid-session
Inconsistent test coverage
Manual review checklist enforcement

After skills:

Claude loads TDD workflow automatically
Coverage stays above 80%
Review happens after every feature
Security checks are consistent

Summary

Building custom Claude Code skills involves:

SKILL.md with proper frontmatter: Name and description (description is the trigger)
Role-based prompting: Multiple skills with distinct roles, not one skill doing everything
Progressive disclosure: Metadata always visible, body when triggered, references as needed
Split content: Core workflow in SKILL.md, details in references
Appropriate freedom level: Text for high freedom, scripts for low freedom

The most transferable lesson from the gstack discussion isn’t the tool itself - it’s the pattern: give AI agents distinct roles, structured processes, and clear boundaries.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Claude Code Skill Creator Documentation
👨‍💻 Reddit Discussion: gstack Role-Based Prompting
👨‍💻 Model Context Protocol Specification
👨‍💻 Anthropic Claude Code Guide

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!