Skip to content

How Skill Loading Works in AI Agents: On-Demand Knowledge Injection

Problem

My agent’s system prompt was massive. I had 10 different skills stuffed into it: git workflow, code review, testing patterns, PDF processing, MCP building. Each skill was about 2000 tokens. That’s 20,000 tokens of knowledge the model had to wade through before even starting a task.

The problem? Most of that knowledge was irrelevant to any given task.

When I asked my agent to review a pull request, it had all the PDF processing instructions loaded. When I asked it to build an MCP server, it had all the code review checklists in context. I was paying for tokens the model never used.

Here’s what my system prompt looked like:

System Prompt (BEFORE)
You are a coding agent with the following capabilities:
## Git Workflow
1. Always create a branch before changes
2. Use conventional commit messages
3. Run tests before pushing
... (500 tokens)
## Code Review
1. Check for security issues
2. Verify test coverage >= 80%
3. Look for performance problems
... (600 tokens)
## PDF Processing
1. Use PyMuPDF for reading PDFs
2. Use ReportLab for creating PDFs
3. Handle encoding issues carefully
... (400 tokens)
## Testing Best Practices
1. Write tests first (TDD)
2. Use descriptive test names
3. Mock external dependencies
... (500 tokens)
Total: ~2000 tokens per skill x 10 skills = 20,000 tokens

I was burning through context window space before the conversation even started.

What happened?

I searched for how real agent systems handle knowledge and found the learn-claude-code repository. The key insight was in session s05:

“Load knowledge when you need it, not upfront” — inject via tool_result, not the system prompt.

Looking at my approach, I could see the problem clearly:

  • Push model: I was pushing all knowledge into the system prompt upfront
  • Wasted tokens: The model paid attention to irrelevant skills
  • Diluted focus: With 10 skills in context, the model had to filter through noise

The solution was a pull model: the model requests knowledge when it needs it.

How to solve it?

I rewrote my agent to use on-demand skill loading. The pattern uses two layers:

Layer 1: skill NAMES in system prompt (cheap)
+--------------------------------------+
| You are a coding agent. |
| Skills available: |
| - git: Git workflow helpers | ~100 tokens total
| - test: Testing best practices |
| - pdf: PDF processing |
+--------------------------------------+
Layer 2: skill BODY via tool_result (on demand)
+--------------------------------------+
| <skill name="git"> |
| Full git workflow instructions... | ~2000 tokens
| Step 1: Create branch... |
| Step 2: Make changes... |
| </skill> |
+--------------------------------------+

Here’s how I implemented it:

Step 1: Create skill files

Each skill is a SKILL.md file in a dedicated directory:

Directory Structure
skills/
pdf/
SKILL.md # PDF processing instructions
code-review/
SKILL.md # Code review checklist
mcp-builder/
SKILL.md # MCP server building guide
git/
SKILL.md # Git workflow patterns

Each SKILL.md has YAML frontmatter with a name and description:

skills/code-review/SKILL.md
---
name: code-review
description: Perform thorough code reviews with security, performance, and maintainability analysis. Use when user asks to review code, check for bugs, or audit a codebase.
---
# Code Review Skill
You now have expertise in conducting comprehensive code reviews. Follow this structured approach:
## Review Checklist
### 1. Security (Critical)
Check for:
- [ ] Injection vulnerabilities: SQL, command, XSS
- [ ] Authentication issues: Hardcoded credentials
- [ ] Data exposure: Sensitive data in logs
### 2. Correctness
Check for:
- [ ] Logic errors: Off-by-one, null handling
- [ ] Race conditions: Concurrent access
- [ ] Error handling: Swallowed exceptions
...

Step 2: Create a SkillLoader

The loader scans for SKILL.md files and provides two things: descriptions (cheap) and content (expensive):

skill_loader.py
from pathlib import Path
import yaml
class SkillLoader:
def __init__(self, skills_dir: Path):
self.skills = {}
for f in sorted(skills_dir.rglob("SKILL.md")):
text = f.read_text()
meta, body = self._parse_frontmatter(text)
name = meta.get("name", f.parent.name)
self.skills[name] = {"meta": meta, "body": body}
def _parse_frontmatter(self, text: str):
"""Parse YAML frontmatter from skill file."""
if text.startswith("---"):
parts = text.split("---", 2)
if len(parts) >= 3:
meta = yaml.safe_load(parts[1])
return meta, parts[2].strip()
return {}, text
def get_descriptions(self) -> str:
"""Return lightweight skill list for system prompt."""
lines = []
for name, skill in self.skills.items():
desc = skill["meta"].get("description", "")
lines.append(f" - {name}: {desc}")
return "\n".join(lines)
def get_content(self, name: str) -> str:
"""Return full skill content when model requests it."""
skill = self.skills.get(name)
if not skill:
return f"Error: Unknown skill '{name}'."
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"

Step 3: Wire it into the agent loop

The system prompt only gets skill names. The full content loads via a tool:

agent.py
from pathlib import Path
from skill_loader import SkillLoader
# Initialize loader
SKILL_LOADER = SkillLoader(Path("skills"))
# System prompt with only skill descriptions (cheap)
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}
Use load_skill(name) to get detailed instructions for any skill."""
# Tool handler for load_skill
TOOL_HANDLERS = {
# ... other tools ...
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}
# Define the tool schema
TOOLS = [
# ... other tools ...
{
"name": "load_skill",
"description": "Load detailed instructions for a skill. Use when you need domain-specific guidance.",
"input_schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Skill name to load (e.g., 'code-review', 'git')"
}
},
"required": ["name"]
}
}
]

Step 4: Run the agent

Now when I ask the agent to do a code review, it loads the skill on demand:

Conversation
User: Review the authentication module for security issues.
Agent: I'll load the code-review skill first to get the proper checklist.
[Tool use: load_skill("code-review")]
Agent: Now I have the security review guidelines loaded. Let me analyze the authentication module...
<skill name="code-review">
# Code Review Skill
## Review Checklist
### 1. Security (Critical)
- [ ] Injection vulnerabilities
- [ ] Authentication issues
...
</skill>
[Agent proceeds with structured review...]

The model only loaded the code-review skill because that’s what it needed. The PDF skill, git skill, and others stayed on disk.

The reason

Why does this pattern work so well?

1. Token efficiency

Compare the costs:

Token Comparison
BEFORE (Push model):
System prompt: 20,000 tokens (all skills loaded)
Every request: Model processes all 20k tokens
AFTER (Pull model):
System prompt: ~500 tokens (skill names only)
When needed: Model loads ~2,000 tokens for ONE skill
Savings: 17,500+ tokens per request

2. Context cleanliness

When the model has 10 skills in its system prompt, it has to filter through all of them to find relevant guidance. This is like searching for a book in a messy room.

With on-demand loading, the model has a clean workspace. It pulls only what it needs.

3. Pull vs Push model

PUSH MODEL (Anti-pattern):
+------------------+
| System Prompt |
| +--------------+ |
| | git skill | |
| | pdf skill | |
| | test skill | |
| | code-review | |
| | ... 6 more | |
| +--------------+ |
+------------------+
|
v
Model receives ALL knowledge
whether it needs it or not
PULL MODEL (Pattern):
+------------------+
| System Prompt |
| +--------------+ |
| | git: "..." | | <- Just descriptions
| | pdf: "..." | |
| | test: "..." | |
| +--------------+ |
+------------------+
|
v
Model sees what's AVAILABLE
|
| load_skill("code-review")
v
+------------------+
| Tool Result |
| +--------------+ |
| | <skill> | |
| | Full code | |
| | review guide | |
| | </skill> | |
| +--------------+ |
+------------------+

The model decides when it needs knowledge. Not the engineer.

Common mistakes I made

Mistake 1: Loading skills in the wrong place

Wrong approach
# WRONG: Loading skill content into system prompt
SYSTEM = f"""You are a coding agent.
{SKILL_LOADER.get_content("git")} # Full skill in system prompt!
{SKILL_LOADER.get_content("pdf")} # Another full skill!
"""

This defeats the entire purpose. The skill content should only appear in tool_result, never in the system prompt.

Mistake 2: Not providing skill descriptions

Wrong approach
# WRONG: No way for model to know what skills exist
SYSTEM = "You are a coding agent. Use load_skill when needed."

The model can’t request skills it doesn’t know about. Always include descriptions in the system prompt.

Mistake 3: Making skill files too long

Wrong approach
# WRONG: 10,000 token skill file
The model's context fills up with one skill. Keep skills focused:
- One skill = one domain
- ~2000 tokens per skill is a good target
- Break large domains into multiple skills

When to use skills vs subagents

Skills and subagents solve different problems:

Decision Guide
SKILLS: Use when you need KNOWLEDGE
- Model stays in same conversation
- Context is added via tool_result
- Example: "How do I review code?" -> load code-review skill
SUBAGENTS: Use when you need ISOLATION
- Model spawns a child agent
- Child gets fresh context
- Example: "Investigate this bug" -> spawn subagent with clean slate

Skills add knowledge to the current agent. Subagents delegate work to a new agent.

Summary

In this post, I showed how on-demand skill loading keeps agent contexts clean. The key insight is using a pull model: the model requests knowledge when it needs it, rather than having all knowledge pushed into the system prompt upfront.

The implementation is simple:

  1. Put skill names/descriptions in the system prompt (cheap)
  2. Put full skill content in files
  3. Provide a load_skill tool
  4. Let the model decide when to load

This pattern saves tokens, keeps context focused, and scales to many skills without bloating the system prompt. When you have 50 skills, you don’t want 100,000 tokens in every system prompt. You want the model to pull the 2,000 tokens it actually needs.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments