Skip to content

How to Create Custom Claude Skills That Actually Work

Problem

Claude kept skipping steps.

I’d ask it to implement a feature, and it would jump straight to coding without running tests first. I’d tell it to follow a specific workflow, and it would combine steps or skip verification. Every session felt like retraining Claude from scratch.

Here’s what happened when I tried to enforce TDD:

My Failed Conversation
Me: Implement the login feature using TDD
Claude: [writes implementation code]
Me: Wait, you skipped the test-first step
Claude: Sorry, let me write the tests now
Me: And you didn't verify coverage
Claude: I'll check that after...
Me: This is exactly what I told you not to do last time

I repeated the same instructions every session. Claude would agree, then forget. I needed a way to encode these workflows permanently.

What I Discovered

I found a Reddit thread where someone had the exact same problem:

“Claude kept skipping steps. So I created a framework that forces it to adhere to the rules and skills I set.”

This led me to Claude Code Skills - a way to package workflows into reusable modules that Claude loads automatically.

Skills are “onboarding guides” for specific tasks. Instead of explaining TDD every session, I could write a skill that encodes the entire workflow. Claude would load it and follow it consistently.

The SKILL.md Format

Every skill needs a SKILL.md file. The structure is simple:

Skill Directory Structure
my-skill/
├── SKILL.md (required)
│ ├── YAML frontmatter (name + description)
│ └── Markdown instructions
└── Optional resources:
├── scripts/ - Executable code
├── references/ - Documentation
└── assets/ - Templates

I tried creating my first skill for TDD enforcement:

~/.claude/skills/tdd-workflow/SKILL.md
---
name: tdd-workflow
description: Enforces test-driven development with 80%+ coverage. Use when writing new features or fixing bugs.
---
# Test-Driven Development Workflow
Write tests first, then implement.
## Steps
1. Write failing test for the feature
2. Run test - confirm it fails
3. Write minimal code to pass
4. Run test - confirm it passes
5. Refactor
6. Verify 80%+ coverage

I tested it. But the skill didn’t trigger reliably.

Why My First Skill Failed

The skill had a vague description. “Use when writing new features or fixing bugs” - but when exactly? Claude couldn’t tell if this situation matched.

I looked at examples from other users. One had created a “contrarian agent” skill:

Good Description Example
---
name: contrarian
description: A sub-agent that critiques plans and decisions. Use when
(1) reviewing implementation plans,
(2) evaluating architectural decisions, or
(3) user needs critical feedback on proposed approaches.
---

The difference was clear: this description had specific triggers. “Use when (1), (2), (3)” - Claude could match against these exact conditions.

I rewrote my TDD skill:

Improved Description
---
name: tdd-workflow
description: Enforces test-driven development with 80%+ coverage. Use when
(1) user requests a new feature,
(2) user asks to fix a bug,
(3) user wants to refactor code, or
(4) user is adding new API endpoints.
---

Now the skill triggers consistently.

Three Levels of Loading

Skills use progressive disclosure to manage context:

Skill Loading Levels
Level 1: Metadata (name + description)
↓ Always loaded in context
~10-50 words
Level 2: SKILL.md body
↓ Loaded when skill triggers
~500-5000 words
Level 3: Bundled resources
↓ Loaded as needed
scripts/, references/, assets/

This means I should:

  • Put triggers in the description (Level 1) - always visible
  • Put detailed instructions in the body (Level 2) - loaded when needed
  • Put heavy references in separate files (Level 3) - loaded on demand

Solving the Skipping Steps Problem

The Reddit user mentioned their solution: a skill that forces Claude to confirm each step. I created a similar one:

~/.claude/skills/enforce-workflow/SKILL.md
---
name: enforce-workflow
description: Forces Claude to follow every workflow step without skipping.
Use when (1) implementing complex multi-step processes,
(2) Claude has a tendency to skip verification steps, or
(3) strict process compliance is required.
---
# Workflow Enforcement
## Mandatory Process
Each step MUST be completed before proceeding.
### Phase 1: Planning
1. Document requirements - REQUIRED before coding
2. Identify dependencies - REQUIRED before implementation
3. Create implementation plan - REQUIRED before touching code
### Phase 2: Implementation
4. Write tests FIRST - REQUIRED (no code without tests)
5. Implement minimal solution
6. Run tests - REQUIRED before refactoring
### Phase 3: Verification
7. Verify coverage (80%+) - REQUIRED before commit
8. Run linting - REQUIRED before commit
9. Commit only when ALL above complete
## Verification
After each step, confirm:
- "Step X complete: [result]"
- "Proceeding to Step Y"
Do not skip or combine steps.

I tested this skill with a feature request:

Test Session
Me: Add user authentication to the API
Claude: [Loading enforce-workflow skill]
Step 1: Document requirements
- I need to understand the authentication requirements first
[Asks clarifying questions about auth type, token format, etc.]
Me: JWT tokens, stored in httpOnly cookies
Claude: Step 1 complete: Requirements documented
Proceeding to Step 2: Identify dependencies
[Continues through each step, confirming each one]

The skill worked. Claude followed every step.

When to Use Skills vs CLAUDE.md

I was confused about when to use skills versus CLAUDE.md rules. Here’s the distinction:

Skills vs CLAUDE.md Decision Guide
CLAUDE.md:
- Global behavior (personality, communication style)
- Project-specific rules (lint, test requirements)
- Always-on instructions
Skills:
- Domain-specific workflows (TDD, code review, deployment)
- Complex multi-step processes
- Shareable across projects and users

Use CLAUDE.md for “how to behave.” Use skills for “how to do something.”

Real Examples from the Community

From the Reddit discussion, I found several useful skill patterns:

Pattern 1: Peer Review Skill

Peer Review Skill Frontmatter
---
name: peer-review
description: Automated peer review for user stories with QA scoring. Use when
(1) reviewing user stories for completeness,
(2) generating QA scores for requirements, or
(3) validating story format against template.
---

This skill automates a domain-specific task that would otherwise require explaining QA criteria every time.

Pattern 2: Knowledge Integration Skill

Knowledge Integration Skill Frontmatter
---
name: obsidian-sync
description: Integrates transcripts and context from notes/emails into Obsidian
knowledge base. Use when (1) processing meeting transcripts,
(2) syncing context from emails or notes, or
(3) updating Obsidian knowledge graph.
---

This skill connects Claude to external systems with specific formatting requirements.

Pattern 3: Contrarian Agent

Contrarian Agent Frontmatter
---
name: contrarian
description: Sub-agent that critiques plans and decisions. Use when
(1) reviewing implementation plans,
(2) evaluating architectural decisions, or
(3) user needs critical feedback on proposed approaches.
---

This skill creates a specialized sub-agent role - Claude acts as a critic instead of an agreeable assistant.

Adding Bundled Resources

For complex skills, I can add supporting files:

Skill with Resources
peer-review-skill/
├── SKILL.md
├── references/
│ ├── user-story-template.md
│ └── qa-criteria.md
└── scripts/
└── score_calculator.py

In the SKILL.md body, I reference these:

Referencing Bundled Resources
## Resources
- **User story format**: See `references/user-story-template.md`
- **QA scoring criteria**: See `references/qa-criteria.md`
- **Score calculation**: Run `scripts/score_calculator.py`

Claude loads these only when needed, saving context for other tasks.

Design Principles I Learned

Principle 1: Be Concise

The context window is shared. I challenge every piece: “Does Claude really need this?” Claude is already smart - I only add context it doesn’t have.

Principle 2: Match Specificity to Task Fragility

Freedom Levels
High freedom (text instructions):
Multiple approaches are valid
Example: "Implement error handling"
Medium freedom (pseudocode):
Preferred pattern exists
Example: "Try-catch with specific error types"
Low freedom (specific scripts):
Operations are fragile
Example: "Run scripts/deploy.sh --env prod"

Fragile operations need low freedom. Flexible tasks need high freedom.

Principle 3: Test with Real Use Cases

I created a skill and immediately tested it with the exact scenarios I wanted to handle. If it didn’t trigger reliably, I refined the description. If it triggered at wrong times, I narrowed the conditions.

Common Mistakes

Mistake 1: Vague Descriptions

Vague vs Specific Description
# BAD: Too vague
description: Helps with testing
# GOOD: Specific triggers
description: Enforces TDD workflow. Use when (1) writing new features,
(2) fixing bugs, or (3) refactoring existing code.

Mistake 2: Too Much in the Body

Skill with Too Much Content
# BAD: Everything in body
---
name: my-skill
description: Does stuff
---
[5000 words of instructions]

The body loads every time the skill triggers. I split large content into references/ files.

Mistake 3: No Testing

I initially assumed my skills would work. They didn’t. Now I test every skill with multiple scenarios before relying on it.

Summary

Custom Claude skills solved my “skipping steps” problem. Instead of re-explaining workflows every session, I encode them in SKILL.md files that load automatically.

The key insights:

  1. Put specific triggers in the description - Claude needs to know exactly when to use the skill
  2. Use progressive disclosure - metadata always, body on trigger, resources on demand
  3. Test with real scenarios - refine descriptions until triggers work reliably

Skills transform Claude from a general assistant into a specialized agent with encoded procedural knowledge. I no longer repeat instructions. I write them once, and Claude follows them every time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments