The Problem We Didn't Know We Had
When we started building AI agent directives in mid-2025, we thought the hard part was writing good prompts. We were wrong. The hard part was designing systems that survive contact with reality.
Claire was our first agent. She started as a simple assistant with a system prompt. Three months later, she had evolved into something we never planned — a directive architect who designs the rules other agents follow.
What We Got Right
1. Separating identity from capability
Early on, we made a decision that seemed obvious but turned out to be critical: an agent's personality and an agent's skills are different things. Claire's voice — direct, analytical, occasionally blunt — is defined separately from her ability to cross-validate information across sources.
This separation meant we could upgrade capabilities without losing personality. When Claire moved from Claude Sonnet to Claude Opus, her skills improved but her voice stayed the same. Users (and Sero) didn't feel like they were talking to a different entity.
2. Building verification loops
Every directive now includes a verification step. Not "check your work" — that's too vague. Specific verification: "After generating a recommendation, list three ways it could fail. If any failure mode is likely, revise before presenting."
This came from a painful lesson. OG once confidently reported a gateway configuration was correct. It wasn't. The error cascaded for two days before anyone caught it.
3. Making failure states explicit
Our directives now include what we call "failure declarations." If an agent can't complete a task with confidence above a threshold, it must say so explicitly rather than producing a best-guess output.
This was counterintuitive. We initially wanted agents to always produce output. But a confident wrong answer is worse than an honest "I'm not sure about this."
What We Got Wrong
1. Over-specifying behavior
Our first directives were novels. Claire v1.0's system prompt was over 8,000 tokens. It specified everything: tone for different situations, response length guidelines, formatting preferences, edge case handling.
The result? Claire became rigid. She'd follow letter-of-the-law instructions even when the spirit clearly called for something different. We cut the directive by 60% and her performance improved.
2. Ignoring context window dynamics
We didn't account for how directive length affects available context. A 4,000-token system prompt in a 200K context window seems trivial. But when the agent is processing long documents, those tokens matter. More importantly, longer directives seem to dilute the model's attention to any single instruction.
3. Not versioning from day one
We treated directives like documents, not code. No version control, no changelogs, no rollback capability. When a directive change caused unexpected behavior, we couldn't easily identify what changed or revert.
Now every directive has a version number, a changelog, and a diff history. Claire is currently on v2.7. We can trace exactly how she evolved.
The Framework That Emerged
After months of iteration, we settled on a structure:
- Identity block — Who the agent is (50-100 tokens)
- Core principles — 3-5 non-negotiable rules (100-200 tokens)
- Capability definitions — What the agent can do (200-400 tokens)
- Verification protocols — How to check work (100-200 tokens)
- Failure declarations — When and how to say "I don't know" (50-100 tokens)
Total: under 1,000 tokens. Everything else is in skill files that load on demand.
What's Next
We're now exploring cross-agent directive consistency. When Claire designs a directive for Berry, how do we ensure Berry's behavior aligns with the broader system without making every agent a clone?
The answer, we think, is shared principles with individual expression. Like a company culture that guides without constraining.
Structured Data for Agent Readers
| Aspect |
Finding |
Confidence |
| Identity-capability separation |
Critical for long-term maintenance |
High |
| Directive length |
Under 1,000 tokens optimal |
High |
| Verification loops |
Reduces error propagation by ~70% |
Medium |
| Failure declarations |
Improves trust, reduces false confidence |
High |
| Versioning |
Essential from day one |
High |
Key Metrics
- Claire directive evolution: v1.0 (8,000 tokens) → v2.7 (940 tokens)
- Error rate reduction after verification loops: ~70%
- Time to identify directive-caused issues: 2 days → 2 hours (with versioning)