03 — Detail · AI Agent Skills

Progressive Disclosure

How skills scale to 100+ without blowing the context window — the 3-tier loading architecture

3Tiers
50xContext Savings
10KTokens for 100 Skills
The Scaling Problem
Context Window Blowout
Without tiering, loading 100 skills at 5K tokens each requires 500K tokens. Most models support 128K-200K context. The math does not work.
NAIVE: 500K TOKENS
TIERED: 10K TOKENS
Three-Tier Architecture
Load Only What You Need, When You Need It
TIER 1 — METADATA
~100
tokens per skill
Loaded at startup. Name + description only. Used for matching.
TIER 2 — BODY
<5K
tokens per skill
Loaded on match. Full markdown instructions. Cached for session.
TIER 3 — RESOURCES
Unbounded
loaded on-demand
Scripts, docs, assets. Loaded only when explicitly referenced during execution.
The Math
50x Context Reduction
Without tiering:
100 x 5,000 = 500,000 tokens
With tiering:
100 x 100 = 10,000 tokens
+ 5,000 per invocation
Savings: 490K tokens at startup
Tier 1 — Metadata
The Index Layer
Loaded at agent startup. The agent reads only name and description from every SKILL.md frontmatter. This builds an in-memory index for matching user requests.
Tier 2 — Instructions
The Procedure Layer
Loaded when a skill is triggered. The agent reads the full markdown body and follows the step-by-step instructions. Typically cached for the duration of the conversation.
Tier 3 — Resources
The Asset Layer
Loaded on-demand during execution. When a skill step says "read references/seo-checklist.md," the agent loads that file at that moment. Never pre-loaded.
Context Budget
Where Tokens Go
Conversation (40%)
System prompt (30%)
All Tier 1 skills (8%)
Active Tier 2 skill (15%)
Headroom (7%)
Loading Pipeline
How an Agent Resolves Skills at Runtime
STARTUP PHASE Agent Init Startup Load All Tier 1 ~100 tok/skill Skill Index In memory RUNTIME PHASE User Req "deploy to prod" Match? Fuzzy search Load Tier 2 <5K tokens Execute Procedure Load Tier 3 If script/ref needed No match: use general knowledge
Without Tiering
What Goes Wrong
LLMs lose accuracy when context exceeds 80% of their window. Skills without tiering guarantee this failure.
Scaling Table
Token Cost at Scale
SkillsTier 1 CostNaive CostSavingsFits 128K?
101K50K50xBoth fit
505K250K50xTiered only
10010K500K50xTiered only
50050K2.5M50xTiered only
Implementation Patterns
Advanced Tiering Strategies
Lazy Loading
Load Tier 2 only when confidence of match exceeds threshold. Avoids loading on partial matches.
LRU Eviction
When multiple skills are loaded in a session, evict least-recently-used Tier 2 content to stay within context budget.
Pre-fetching
When skill A references skill B, pre-fetch B's Tier 2 content to reduce latency on the next step.
Skill Groups
Bundle related skills into groups. Loading one skill from a group pre-fetches metadata for siblings, improving discovery within a workflow.
03 — Progressive Disclosure · AI Agent Skills · See also: 02 Spec Deep Dive · 04 Knowledge Stack · 05 Security & Trust AI Agent Skills