How to Create Custom Claude Skills That Actually Work
Problem
Claude kept skipping steps.
I’d ask it to implement a feature, and it would jump straight to coding without running tests first. I’d tell it to follow a specific workflow, and it would combine steps or skip verification. Every session felt like retraining Claude from scratch.
Here’s what happened when I tried to enforce TDD:
Me: Implement the login feature using TDD
Claude: [writes implementation code]
Me: Wait, you skipped the test-first step
Claude: Sorry, let me write the tests now
Me: And you didn't verify coverage
Claude: I'll check that after...
Me: This is exactly what I told you not to do last timeI repeated the same instructions every session. Claude would agree, then forget. I needed a way to encode these workflows permanently.
What I Discovered
I found a Reddit thread where someone had the exact same problem:
“Claude kept skipping steps. So I created a framework that forces it to adhere to the rules and skills I set.”
This led me to Claude Code Skills - a way to package workflows into reusable modules that Claude loads automatically.
Skills are “onboarding guides” for specific tasks. Instead of explaining TDD every session, I could write a skill that encodes the entire workflow. Claude would load it and follow it consistently.
The SKILL.md Format
Every skill needs a SKILL.md file. The structure is simple:
my-skill/├── SKILL.md (required)│ ├── YAML frontmatter (name + description)│ └── Markdown instructions└── Optional resources: ├── scripts/ - Executable code ├── references/ - Documentation └── assets/ - TemplatesI tried creating my first skill for TDD enforcement:
---name: tdd-workflowdescription: Enforces test-driven development with 80%+ coverage. Use when writing new features or fixing bugs.---
# Test-Driven Development Workflow
Write tests first, then implement.
## Steps
1. Write failing test for the feature2. Run test - confirm it fails3. Write minimal code to pass4. Run test - confirm it passes5. Refactor6. Verify 80%+ coverageI tested it. But the skill didn’t trigger reliably.
Why My First Skill Failed
The skill had a vague description. “Use when writing new features or fixing bugs” - but when exactly? Claude couldn’t tell if this situation matched.
I looked at examples from other users. One had created a “contrarian agent” skill:
---name: contrariandescription: A sub-agent that critiques plans and decisions. Use when (1) reviewing implementation plans, (2) evaluating architectural decisions, or (3) user needs critical feedback on proposed approaches.---The difference was clear: this description had specific triggers. “Use when (1), (2), (3)” - Claude could match against these exact conditions.
I rewrote my TDD skill:
---name: tdd-workflowdescription: Enforces test-driven development with 80%+ coverage. Use when (1) user requests a new feature, (2) user asks to fix a bug, (3) user wants to refactor code, or (4) user is adding new API endpoints.---Now the skill triggers consistently.
Three Levels of Loading
Skills use progressive disclosure to manage context:
Level 1: Metadata (name + description) ↓ Always loaded in context ~10-50 words
Level 2: SKILL.md body ↓ Loaded when skill triggers ~500-5000 words
Level 3: Bundled resources ↓ Loaded as needed scripts/, references/, assets/This means I should:
- Put triggers in the description (Level 1) - always visible
- Put detailed instructions in the body (Level 2) - loaded when needed
- Put heavy references in separate files (Level 3) - loaded on demand
Solving the Skipping Steps Problem
The Reddit user mentioned their solution: a skill that forces Claude to confirm each step. I created a similar one:
---name: enforce-workflowdescription: Forces Claude to follow every workflow step without skipping. Use when (1) implementing complex multi-step processes, (2) Claude has a tendency to skip verification steps, or (3) strict process compliance is required.---
# Workflow Enforcement
## Mandatory Process
Each step MUST be completed before proceeding.
### Phase 1: Planning1. Document requirements - REQUIRED before coding2. Identify dependencies - REQUIRED before implementation3. Create implementation plan - REQUIRED before touching code
### Phase 2: Implementation4. Write tests FIRST - REQUIRED (no code without tests)5. Implement minimal solution6. Run tests - REQUIRED before refactoring
### Phase 3: Verification7. Verify coverage (80%+) - REQUIRED before commit8. Run linting - REQUIRED before commit9. Commit only when ALL above complete
## Verification
After each step, confirm:- "Step X complete: [result]"- "Proceeding to Step Y"
Do not skip or combine steps.I tested this skill with a feature request:
Me: Add user authentication to the API
Claude: [Loading enforce-workflow skill]
Step 1: Document requirements- I need to understand the authentication requirements first[Asks clarifying questions about auth type, token format, etc.]
Me: JWT tokens, stored in httpOnly cookies
Claude: Step 1 complete: Requirements documentedProceeding to Step 2: Identify dependencies
[Continues through each step, confirming each one]The skill worked. Claude followed every step.
When to Use Skills vs CLAUDE.md
I was confused about when to use skills versus CLAUDE.md rules. Here’s the distinction:
CLAUDE.md:- Global behavior (personality, communication style)- Project-specific rules (lint, test requirements)- Always-on instructions
Skills:- Domain-specific workflows (TDD, code review, deployment)- Complex multi-step processes- Shareable across projects and usersUse CLAUDE.md for “how to behave.” Use skills for “how to do something.”
Real Examples from the Community
From the Reddit discussion, I found several useful skill patterns:
Pattern 1: Peer Review Skill
---name: peer-reviewdescription: Automated peer review for user stories with QA scoring. Use when (1) reviewing user stories for completeness, (2) generating QA scores for requirements, or (3) validating story format against template.---This skill automates a domain-specific task that would otherwise require explaining QA criteria every time.
Pattern 2: Knowledge Integration Skill
---name: obsidian-syncdescription: Integrates transcripts and context from notes/emails into Obsidian knowledge base. Use when (1) processing meeting transcripts, (2) syncing context from emails or notes, or (3) updating Obsidian knowledge graph.---This skill connects Claude to external systems with specific formatting requirements.
Pattern 3: Contrarian Agent
---name: contrariandescription: Sub-agent that critiques plans and decisions. Use when (1) reviewing implementation plans, (2) evaluating architectural decisions, or (3) user needs critical feedback on proposed approaches.---This skill creates a specialized sub-agent role - Claude acts as a critic instead of an agreeable assistant.
Adding Bundled Resources
For complex skills, I can add supporting files:
peer-review-skill/├── SKILL.md├── references/│ ├── user-story-template.md│ └── qa-criteria.md└── scripts/ └── score_calculator.pyIn the SKILL.md body, I reference these:
## Resources
- **User story format**: See `references/user-story-template.md`- **QA scoring criteria**: See `references/qa-criteria.md`- **Score calculation**: Run `scripts/score_calculator.py`Claude loads these only when needed, saving context for other tasks.
Design Principles I Learned
Principle 1: Be Concise
The context window is shared. I challenge every piece: “Does Claude really need this?” Claude is already smart - I only add context it doesn’t have.
Principle 2: Match Specificity to Task Fragility
High freedom (text instructions): Multiple approaches are valid Example: "Implement error handling"
Medium freedom (pseudocode): Preferred pattern exists Example: "Try-catch with specific error types"
Low freedom (specific scripts): Operations are fragile Example: "Run scripts/deploy.sh --env prod"Fragile operations need low freedom. Flexible tasks need high freedom.
Principle 3: Test with Real Use Cases
I created a skill and immediately tested it with the exact scenarios I wanted to handle. If it didn’t trigger reliably, I refined the description. If it triggered at wrong times, I narrowed the conditions.
Common Mistakes
Mistake 1: Vague Descriptions
# BAD: Too vaguedescription: Helps with testing
# GOOD: Specific triggersdescription: Enforces TDD workflow. Use when (1) writing new features, (2) fixing bugs, or (3) refactoring existing code.Mistake 2: Too Much in the Body
# BAD: Everything in body---name: my-skilldescription: Does stuff---
[5000 words of instructions]The body loads every time the skill triggers. I split large content into references/ files.
Mistake 3: No Testing
I initially assumed my skills would work. They didn’t. Now I test every skill with multiple scenarios before relying on it.
Summary
Custom Claude skills solved my “skipping steps” problem. Instead of re-explaining workflows every session, I encode them in SKILL.md files that load automatically.
The key insights:
- Put specific triggers in the description - Claude needs to know exactly when to use the skill
- Use progressive disclosure - metadata always, body on trigger, resources on demand
- Test with real scenarios - refine descriptions until triggers work reliably
Skills transform Claude from a general assistant into a specialized agent with encoded procedural knowledge. I no longer repeat instructions. I write them once, and Claude follows them every time.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments