The Skill File Pattern: Teaching Agents One Thing at a Time

Every team building agents eventually hits the same wall. The system prompt works. Then a new capability gets added. Then another. Then a fix for a regression. Three months in, the prompt is 4,000 tokens of compacted instructions with internal contradictions no one has the confidence to touch. This is the monolith problem, and skill files are the answer.

What a Skill File Is

A skill file is a self-contained instruction document for a single, well-defined capability. It is separate from the system prompt. It is loaded only when relevant. It contains everything the agent needs to execute that capability correctly — and nothing else.

Think of the system prompt as the agent's constitution: identity, scope, principles, prohibitions. Skill files are the procedural law: specific rules for specific situations, invoked when those situations arise.

A skill file has three components:

Trigger condition — when this skill applies
Procedure — step-by-step execution instructions
Edge case handling — what to do when normal procedure doesn't cover it

Why Separation Matters

When I cross-referenced prompt architectures across four production agent systems, the ones with the lowest regression rates shared a structural trait: capabilities were isolated. A change to one capability could not accidentally alter the behavior of another.

Monolithic prompts don't have this property. A 3,000-token system prompt is a dense tangle of instructions where a change to section 7 can shift the probabilistic interpretation of section 2. You can't test changes in isolation because there is no isolation.

Skill files create a modular architecture. Each file can be:

Versioned independently
Tested against its specific trigger cases
Loaded or withheld based on context
Updated without touching the base system prompt

Designing Trigger Conditions

The trigger condition is the most important part of a skill file to get right. A poorly defined trigger causes either under-invocation (the skill never fires when it should) or over-invocation (the skill fires in contexts it wasn't designed for).

Trigger conditions should be:

Specific — defined by observable signals, not vague categories
Mutually exclusive where possible — avoid skill files that compete to handle the same input
Tested with real examples — write at least five sample inputs and verify they activate the correct skill

Examples

Weak trigger:

Use this skill when the user wants help with their account.

Strong trigger:

Use this skill when the user explicitly requests a refund, references a charge they
did not authorize, or asks to cancel a subscription. Do not use this skill for
general billing questions — route those to the billing-info skill.

The strong version names adjacent skills and draws the boundary between them. The agent doesn't have to guess.

A Real Skill File Example

Here is a minimal skill file for a refund workflow:

# SKILL: process-refund
## Trigger
Activate when the user requests a refund or disputes a charge.

## Procedure
1. Confirm the charge in question using lookup_account(user_id, charge_id).
2. Verify the charge is within the 30-day refund window.
3. If eligible: issue the refund using issue_refund(charge_id) and confirm the amount
   and timeline to the user.
4. If ineligible: explain the refund window policy. Offer to escalate if the user
   believes there are extenuating circumstances.

## Edge Cases
- If the charge_id cannot be found: ask the user to confirm the date and amount.
  Do not issue a refund without a confirmed charge_id.
- If the refund tool returns an error: do not tell the user the refund was issued.
  Escalate immediately using escalate(reason="refund_tool_error").

This file is 150 words. It covers one workflow. It names the tools it uses. It handles the error case. It does not contain identity information, general principles, or anything about what the agent is.

What the System Prompt Is Not

The most common implementation error is treating the system prompt as a skill file aggregator — loading it with procedures and workflows that should live elsewhere. This produces exactly the monolith problem described above.

The system prompt should be stable. Skill files should change frequently. If your system prompt is being updated every sprint to accommodate new workflows, the architecture is wrong.

Composition, Not Configuration

The power of skill files is compositional. An agent with ten isolated skill files can handle ten distinct workflows with high reliability. The same ten workflows crammed into a single prompt will produce an agent that handles each of them at reduced accuracy — because every token in a prompt is competing for interpretive weight.

Separate the concerns. Load what's needed. Test each piece independently. The monolith approach trades short-term convenience for long-term brittleness. Skill files trade a small upfront architectural investment for a system that can actually be maintained.