How Skill Loading Works in AI Agents: On-Demand Knowledge Injection
Problem
My agent’s system prompt was massive. I had 10 different skills stuffed into it: git workflow, code review, testing patterns, PDF processing, MCP building. Each skill was about 2000 tokens. That’s 20,000 tokens of knowledge the model had to wade through before even starting a task.
The problem? Most of that knowledge was irrelevant to any given task.
When I asked my agent to review a pull request, it had all the PDF processing instructions loaded. When I asked it to build an MCP server, it had all the code review checklists in context. I was paying for tokens the model never used.
Here’s what my system prompt looked like:
You are a coding agent with the following capabilities:
## Git Workflow1. Always create a branch before changes2. Use conventional commit messages3. Run tests before pushing... (500 tokens)
## Code Review1. Check for security issues2. Verify test coverage >= 80%3. Look for performance problems... (600 tokens)
## PDF Processing1. Use PyMuPDF for reading PDFs2. Use ReportLab for creating PDFs3. Handle encoding issues carefully... (400 tokens)
## Testing Best Practices1. Write tests first (TDD)2. Use descriptive test names3. Mock external dependencies... (500 tokens)
Total: ~2000 tokens per skill x 10 skills = 20,000 tokensI was burning through context window space before the conversation even started.
What happened?
I searched for how real agent systems handle knowledge and found the learn-claude-code repository. The key insight was in session s05:
“Load knowledge when you need it, not upfront” — inject via tool_result, not the system prompt.
Looking at my approach, I could see the problem clearly:
- Push model: I was pushing all knowledge into the system prompt upfront
- Wasted tokens: The model paid attention to irrelevant skills
- Diluted focus: With 10 skills in context, the model had to filter through noise
The solution was a pull model: the model requests knowledge when it needs it.
How to solve it?
I rewrote my agent to use on-demand skill loading. The pattern uses two layers:
Layer 1: skill NAMES in system prompt (cheap)+--------------------------------------+| You are a coding agent. || Skills available: || - git: Git workflow helpers | ~100 tokens total| - test: Testing best practices || - pdf: PDF processing |+--------------------------------------+
Layer 2: skill BODY via tool_result (on demand)+--------------------------------------+| <skill name="git"> || Full git workflow instructions... | ~2000 tokens| Step 1: Create branch... || Step 2: Make changes... || </skill> |+--------------------------------------+Here’s how I implemented it:
Step 1: Create skill files
Each skill is a SKILL.md file in a dedicated directory:
skills/ pdf/ SKILL.md # PDF processing instructions code-review/ SKILL.md # Code review checklist mcp-builder/ SKILL.md # MCP server building guide git/ SKILL.md # Git workflow patternsEach SKILL.md has YAML frontmatter with a name and description:
---name: code-reviewdescription: Perform thorough code reviews with security, performance, and maintainability analysis. Use when user asks to review code, check for bugs, or audit a codebase.---
# Code Review Skill
You now have expertise in conducting comprehensive code reviews. Follow this structured approach:
## Review Checklist
### 1. Security (Critical)Check for:- [ ] Injection vulnerabilities: SQL, command, XSS- [ ] Authentication issues: Hardcoded credentials- [ ] Data exposure: Sensitive data in logs
### 2. CorrectnessCheck for:- [ ] Logic errors: Off-by-one, null handling- [ ] Race conditions: Concurrent access- [ ] Error handling: Swallowed exceptions...Step 2: Create a SkillLoader
The loader scans for SKILL.md files and provides two things: descriptions (cheap) and content (expensive):
from pathlib import Pathimport yaml
class SkillLoader: def __init__(self, skills_dir: Path): self.skills = {} for f in sorted(skills_dir.rglob("SKILL.md")): text = f.read_text() meta, body = self._parse_frontmatter(text) name = meta.get("name", f.parent.name) self.skills[name] = {"meta": meta, "body": body}
def _parse_frontmatter(self, text: str): """Parse YAML frontmatter from skill file.""" if text.startswith("---"): parts = text.split("---", 2) if len(parts) >= 3: meta = yaml.safe_load(parts[1]) return meta, parts[2].strip() return {}, text
def get_descriptions(self) -> str: """Return lightweight skill list for system prompt.""" lines = [] for name, skill in self.skills.items(): desc = skill["meta"].get("description", "") lines.append(f" - {name}: {desc}") return "\n".join(lines)
def get_content(self, name: str) -> str: """Return full skill content when model requests it.""" skill = self.skills.get(name) if not skill: return f"Error: Unknown skill '{name}'." return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"Step 3: Wire it into the agent loop
The system prompt only gets skill names. The full content loads via a tool:
from pathlib import Pathfrom skill_loader import SkillLoader
# Initialize loaderSKILL_LOADER = SkillLoader(Path("skills"))
# System prompt with only skill descriptions (cheap)SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:{SKILL_LOADER.get_descriptions()}
Use load_skill(name) to get detailed instructions for any skill."""
# Tool handler for load_skillTOOL_HANDLERS = { # ... other tools ... "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),}
# Define the tool schemaTOOLS = [ # ... other tools ... { "name": "load_skill", "description": "Load detailed instructions for a skill. Use when you need domain-specific guidance.", "input_schema": { "type": "object", "properties": { "name": { "type": "string", "description": "Skill name to load (e.g., 'code-review', 'git')" } }, "required": ["name"] } }]Step 4: Run the agent
Now when I ask the agent to do a code review, it loads the skill on demand:
User: Review the authentication module for security issues.
Agent: I'll load the code-review skill first to get the proper checklist.
[Tool use: load_skill("code-review")]
Agent: Now I have the security review guidelines loaded. Let me analyze the authentication module...
<skill name="code-review"># Code Review Skill## Review Checklist### 1. Security (Critical)- [ ] Injection vulnerabilities- [ ] Authentication issues...</skill>
[Agent proceeds with structured review...]The model only loaded the code-review skill because that’s what it needed. The PDF skill, git skill, and others stayed on disk.
The reason
Why does this pattern work so well?
1. Token efficiency
Compare the costs:
BEFORE (Push model):System prompt: 20,000 tokens (all skills loaded)Every request: Model processes all 20k tokens
AFTER (Pull model):System prompt: ~500 tokens (skill names only)When needed: Model loads ~2,000 tokens for ONE skill
Savings: 17,500+ tokens per request2. Context cleanliness
When the model has 10 skills in its system prompt, it has to filter through all of them to find relevant guidance. This is like searching for a book in a messy room.
With on-demand loading, the model has a clean workspace. It pulls only what it needs.
3. Pull vs Push model
PUSH MODEL (Anti-pattern):+------------------+| System Prompt || +--------------+ || | git skill | || | pdf skill | || | test skill | || | code-review | || | ... 6 more | || +--------------+ |+------------------+ | v Model receives ALL knowledge whether it needs it or not
PULL MODEL (Pattern):+------------------+| System Prompt || +--------------+ || | git: "..." | | <- Just descriptions| | pdf: "..." | || | test: "..." | || +--------------+ |+------------------+ | v Model sees what's AVAILABLE | | load_skill("code-review") v+------------------+| Tool Result || +--------------+ || | <skill> | || | Full code | || | review guide | || | </skill> | || +--------------+ |+------------------+The model decides when it needs knowledge. Not the engineer.
Common mistakes I made
Mistake 1: Loading skills in the wrong place
# WRONG: Loading skill content into system promptSYSTEM = f"""You are a coding agent.{SKILL_LOADER.get_content("git")} # Full skill in system prompt!{SKILL_LOADER.get_content("pdf")} # Another full skill!"""This defeats the entire purpose. The skill content should only appear in tool_result, never in the system prompt.
Mistake 2: Not providing skill descriptions
# WRONG: No way for model to know what skills existSYSTEM = "You are a coding agent. Use load_skill when needed."The model can’t request skills it doesn’t know about. Always include descriptions in the system prompt.
Mistake 3: Making skill files too long
# WRONG: 10,000 token skill fileThe model's context fills up with one skill. Keep skills focused:- One skill = one domain- ~2000 tokens per skill is a good target- Break large domains into multiple skillsWhen to use skills vs subagents
Skills and subagents solve different problems:
SKILLS: Use when you need KNOWLEDGE- Model stays in same conversation- Context is added via tool_result- Example: "How do I review code?" -> load code-review skill
SUBAGENTS: Use when you need ISOLATION- Model spawns a child agent- Child gets fresh context- Example: "Investigate this bug" -> spawn subagent with clean slateSkills add knowledge to the current agent. Subagents delegate work to a new agent.
Summary
In this post, I showed how on-demand skill loading keeps agent contexts clean. The key insight is using a pull model: the model requests knowledge when it needs it, rather than having all knowledge pushed into the system prompt upfront.
The implementation is simple:
- Put skill names/descriptions in the system prompt (cheap)
- Put full skill content in files
- Provide a
load_skilltool - Let the model decide when to load
This pattern saves tokens, keeps context focused, and scales to many skills without bloating the system prompt. When you have 50 skills, you don’t want 100,000 tokens in every system prompt. You want the model to pull the 2,000 tokens it actually needs.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments