How to Create Custom Claude Skills That Actually Work

Mar 28, 2026

Problem

Claude kept skipping steps.

I’d ask it to implement a feature, and it would jump straight to coding without running tests first. I’d tell it to follow a specific workflow, and it would combine steps or skip verification. Every session felt like retraining Claude from scratch.

Here’s what happened when I tried to enforce TDD:

Me: Implement the login feature using TDD

Claude: [writes implementation code]

Me: Wait, you skipped the test-first step

Claude: Sorry, let me write the tests now

Me: And you didn't verify coverage

Claude: I'll check that after...

Me: This is exactly what I told you not to do last time

I repeated the same instructions every session. Claude would agree, then forget. I needed a way to encode these workflows permanently.

What I Discovered

I found a Reddit thread where someone had the exact same problem:

“Claude kept skipping steps. So I created a framework that forces it to adhere to the rules and skills I set.”

This led me to Claude Code Skills - a way to package workflows into reusable modules that Claude loads automatically.

Skills are “onboarding guides” for specific tasks. Instead of explaining TDD every session, I could write a skill that encodes the entire workflow. Claude would load it and follow it consistently.

The SKILL.md Format

Every skill needs a SKILL.md file. The structure is simple:

my-skill/
├── SKILL.md (required)
│   ├── YAML frontmatter (name + description)
│   └── Markdown instructions
└── Optional resources:
    ├── scripts/       - Executable code
    ├── references/    - Documentation
    └── assets/        - Templates

I tried creating my first skill for TDD enforcement:

---
name: tdd-workflow
description: Enforces test-driven development with 80%+ coverage. Use when writing new features or fixing bugs.
---

# Test-Driven Development Workflow

Write tests first, then implement.

## Steps

1. Write failing test for the feature
2. Run test - confirm it fails
3. Write minimal code to pass
4. Run test - confirm it passes
5. Refactor
6. Verify 80%+ coverage

I tested it. But the skill didn’t trigger reliably.

Why My First Skill Failed

The skill had a vague description. “Use when writing new features or fixing bugs” - but when exactly? Claude couldn’t tell if this situation matched.

I looked at examples from other users. One had created a “contrarian agent” skill:

---
name: contrarian
description: A sub-agent that critiques plans and decisions. Use when
  (1) reviewing implementation plans,
  (2) evaluating architectural decisions, or
  (3) user needs critical feedback on proposed approaches.
---

The difference was clear: this description had specific triggers. “Use when (1), (2), (3)” - Claude could match against these exact conditions.

I rewrote my TDD skill:

---
name: tdd-workflow
description: Enforces test-driven development with 80%+ coverage. Use when
  (1) user requests a new feature,
  (2) user asks to fix a bug,
  (3) user wants to refactor code, or
  (4) user is adding new API endpoints.
---

Now the skill triggers consistently.

Three Levels of Loading

Skills use progressive disclosure to manage context:

Level 1: Metadata (name + description)
         ↓ Always loaded in context
         ~10-50 words

Level 2: SKILL.md body
         ↓ Loaded when skill triggers
         ~500-5000 words

Level 3: Bundled resources
         ↓ Loaded as needed
         scripts/, references/, assets/

This means I should:

Put triggers in the description (Level 1) - always visible
Put detailed instructions in the body (Level 2) - loaded when needed
Put heavy references in separate files (Level 3) - loaded on demand

Solving the Skipping Steps Problem

The Reddit user mentioned their solution: a skill that forces Claude to confirm each step. I created a similar one:

---
name: enforce-workflow
description: Forces Claude to follow every workflow step without skipping.
  Use when (1) implementing complex multi-step processes,
  (2) Claude has a tendency to skip verification steps, or
  (3) strict process compliance is required.
---

# Workflow Enforcement

## Mandatory Process

Each step MUST be completed before proceeding.

### Phase 1: Planning
1. Document requirements - REQUIRED before coding
2. Identify dependencies - REQUIRED before implementation
3. Create implementation plan - REQUIRED before touching code

### Phase 2: Implementation
4. Write tests FIRST - REQUIRED (no code without tests)
5. Implement minimal solution
6. Run tests - REQUIRED before refactoring

### Phase 3: Verification
7. Verify coverage (80%+) - REQUIRED before commit
8. Run linting - REQUIRED before commit
9. Commit only when ALL above complete

## Verification

After each step, confirm:
- "Step X complete: [result]"
- "Proceeding to Step Y"

Do not skip or combine steps.

I tested this skill with a feature request:

Me: Add user authentication to the API

Claude: [Loading enforce-workflow skill]

Step 1: Document requirements
- I need to understand the authentication requirements first
[Asks clarifying questions about auth type, token format, etc.]

Me: JWT tokens, stored in httpOnly cookies

Claude: Step 1 complete: Requirements documented
Proceeding to Step 2: Identify dependencies

[Continues through each step, confirming each one]

The skill worked. Claude followed every step.

When to Use Skills vs CLAUDE.md

I was confused about when to use skills versus CLAUDE.md rules. Here’s the distinction:

CLAUDE.md:
- Global behavior (personality, communication style)
- Project-specific rules (lint, test requirements)
- Always-on instructions

Skills:
- Domain-specific workflows (TDD, code review, deployment)
- Complex multi-step processes
- Shareable across projects and users

Use CLAUDE.md for “how to behave.” Use skills for “how to do something.”

Real Examples from the Community

From the Reddit discussion, I found several useful skill patterns:

Pattern 1: Peer Review Skill

---
name: peer-review
description: Automated peer review for user stories with QA scoring. Use when
  (1) reviewing user stories for completeness,
  (2) generating QA scores for requirements, or
  (3) validating story format against template.
---

This skill automates a domain-specific task that would otherwise require explaining QA criteria every time.

Pattern 2: Knowledge Integration Skill

---
name: obsidian-sync
description: Integrates transcripts and context from notes/emails into Obsidian
  knowledge base. Use when (1) processing meeting transcripts,
  (2) syncing context from emails or notes, or
  (3) updating Obsidian knowledge graph.
---

This skill connects Claude to external systems with specific formatting requirements.

Pattern 3: Contrarian Agent

---
name: contrarian
description: Sub-agent that critiques plans and decisions. Use when
  (1) reviewing implementation plans,
  (2) evaluating architectural decisions, or
  (3) user needs critical feedback on proposed approaches.
---

This skill creates a specialized sub-agent role - Claude acts as a critic instead of an agreeable assistant.

Adding Bundled Resources

For complex skills, I can add supporting files:

peer-review-skill/
├── SKILL.md
├── references/
│   ├── user-story-template.md
│   └── qa-criteria.md
└── scripts/
    └── score_calculator.py

In the SKILL.md body, I reference these:

## Resources

- **User story format**: See `references/user-story-template.md`
- **QA scoring criteria**: See `references/qa-criteria.md`
- **Score calculation**: Run `scripts/score_calculator.py`

Claude loads these only when needed, saving context for other tasks.

Design Principles I Learned

Principle 1: Be Concise

The context window is shared. I challenge every piece: “Does Claude really need this?” Claude is already smart - I only add context it doesn’t have.

Principle 2: Match Specificity to Task Fragility

High freedom (text instructions):
  Multiple approaches are valid
  Example: "Implement error handling"

Medium freedom (pseudocode):
  Preferred pattern exists
  Example: "Try-catch with specific error types"

Low freedom (specific scripts):
  Operations are fragile
  Example: "Run scripts/deploy.sh --env prod"

Fragile operations need low freedom. Flexible tasks need high freedom.

Principle 3: Test with Real Use Cases

I created a skill and immediately tested it with the exact scenarios I wanted to handle. If it didn’t trigger reliably, I refined the description. If it triggered at wrong times, I narrowed the conditions.

Common Mistakes

Mistake 1: Vague Descriptions

# BAD: Too vague
description: Helps with testing

# GOOD: Specific triggers
description: Enforces TDD workflow. Use when (1) writing new features,
  (2) fixing bugs, or (3) refactoring existing code.

Mistake 2: Too Much in the Body

# BAD: Everything in body
---
name: my-skill
description: Does stuff
---

[5000 words of instructions]

The body loads every time the skill triggers. I split large content into references/ files.

Mistake 3: No Testing

I initially assumed my skills would work. They didn’t. Now I test every skill with multiple scenarios before relying on it.

Summary

Custom Claude skills solved my “skipping steps” problem. Instead of re-explaining workflows every session, I encode them in SKILL.md files that load automatically.

The key insights:

Put specific triggers in the description - Claude needs to know exactly when to use the skill
Use progressive disclosure - metadata always, body on trigger, resources on demand
Test with real scenarios - refine descriptions until triggers work reliably

Skills transform Claude from a general assistant into a specialized agent with encoded procedural knowledge. I no longer repeat instructions. I write them once, and Claude follows them every time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!