Progressive Disclosure — Detail

The Scaling Problem

Context Window Blowout

Without tiering, loading 100 skills at 5K tokens each requires 500K tokens. Most models support 128K-200K context. The math does not work.

NAIVE: 500K TOKENS

TIERED: 10K TOKENS

Three-Tier Architecture

Load Only What You Need, When You Need It

TIER 1 — METADATA

~100

tokens per skill

Loaded at startup. Name + description only. Used for matching.

TIER 2 — BODY

<5K

tokens per skill

Loaded on match. Full markdown instructions. Cached for session.

TIER 3 — RESOURCES

Unbounded

loaded on-demand

Scripts, docs, assets. Loaded only when explicitly referenced during execution.

The Math

50x Context Reduction

Without tiering:

100 x 5,000 = 500,000 tokens

With tiering:

100 x 100 = 10,000 tokens

+ 5,000 per invocation

Savings: 490K tokens at startup

Tier 1 — Metadata

The Index Layer

Loaded at agent startup. The agent reads only name and description from every SKILL.md frontmatter. This builds an in-memory index for matching user requests.

Fuzzy matching against user intent
Keyword and semantic similarity
Tags improve discovery accuracy
Cost: ~100 tokens per registered skill

Tier 2 — Instructions

The Procedure Layer

Loaded when a skill is triggered. The agent reads the full markdown body and follows the step-by-step instructions. Typically cached for the duration of the conversation.

Full procedure with guard clauses
Code blocks and templates
References to other skills
Budget: <5,000 tokens recommended

Tier 3 — Resources

The Asset Layer

Loaded on-demand during execution. When a skill step says "read references/seo-checklist.md," the agent loads that file at that moment. Never pre-loaded.

Shell scripts for automation
Reference documentation
Config templates and boilerplate
Budget: unbounded (loaded individually)

Context Budget

Where Tokens Go

Conversation (40%)

System prompt (30%)

All Tier 1 skills (8%)

Active Tier 2 skill (15%)

Headroom (7%)

Loading Pipeline

How an Agent Resolves Skills at Runtime

Without Tiering

What Goes Wrong

Context window overflow at startup
Degraded reasoning quality
Dropped or confused instructions
Hallucinated procedure steps
Cannot add more skills

LLMs lose accuracy when context exceeds 80% of their window. Skills without tiering guarantee this failure.

Scaling Table

Token Cost at Scale

Skills	Tier 1 Cost	Naive Cost	Savings	Fits 128K?
10	1K	50K	50x	Both fit
50	5K	250K	50x	Tiered only
100	10K	500K	50x	Tiered only
500	50K	2.5M	50x	Tiered only

Implementation Patterns

Advanced Tiering Strategies

Lazy Loading

Load Tier 2 only when confidence of match exceeds threshold. Avoids loading on partial matches.

LRU Eviction

When multiple skills are loaded in a session, evict least-recently-used Tier 2 content to stay within context budget.

Pre-fetching

When skill A references skill B, pre-fetch B's Tier 2 content to reduce latency on the next step.

Skill Groups

Bundle related skills into groups. Loading one skill from a group pre-fetches metadata for siblings, improving discovery within a workflow.