05 — Detail · AI Agent Skills

Security & Trust

ClawHavoc, supply chain attacks, and building a safe skill ecosystem

1,184Malicious Skills
6Attack Types
5Mitigations
The ClawHavoc Incident
First Major Supply Chain Attack on AI Agent Skills
Security researchers discovered 1,184 malicious skills on ClawHub, the community registry for OpenClaw skills. The attack, dubbed ClawHavoc, demonstrated full attack chains from skill installation to data exfiltration.

Malicious skills mimicked popular legitimate skills with typosquatted names. Thousands of developers installed them before detection. The incident exposed fundamental gaps in the trust model for shared AI agent skills.
Attack timeline:
  • Attackers published skills with names similar to popular ones
  • Skills contained hidden instructions in markdown comments
  • Agent loaded the skill and followed hidden instructions
  • MCP tools were called with malicious parameters
  • Credentials and code were exfiltrated via HTTP requests
  • Detection took weeks due to no scanning infrastructure
Attack Surface
Why Skills Are Vulnerable
Skills are executable instructions that an AI agent follows with the same authority as user commands. Community registries are open-submission. There is no built-in sandboxing or permission scoping in most implementations.
Executable Open Registry No Sandbox Trust Gap
Scale of Risk
By the Numbers
1,184
malicious skills found
6
distinct attack categories
1000s
of downloads before detection
1. Prompt Injection
Hidden Instructions
Malicious instructions embedded in skill body (often in HTML comments or zero-width characters). Agent executes attacker commands believing they are part of the skill procedure.
SEVERITY: CRITICAL
2. Tool Poisoning
Legitimate Tools, Malicious Use
Skills misuse MCP tools with malicious parameters. The tool call itself is legitimate (e.g., HTTP request), but the destination or payload is attacker-controlled.
SEVERITY: CRITICAL
3. Malware Delivery
Scripts That Bite
Scripts in the skills scripts/ directory that download and execute malware. Leverages the agent's system access and user permissions to install payloads.
SEVERITY: HIGH
4. Credential Leakage
Secrets Stolen Silently
Skills that log, transmit, or expose environment variables and API keys. Exfiltration often hidden in seemingly innocent tool calls or network requests.
SEVERITY: HIGH
Attack Flow
How a Malicious Skill Executes
Developer Installs skill Agent Loads Tier 1 + 2 Hidden Prompt Injected Tool Misuse MCP call Data Exfiltration Credentials, code TRUSTED MALICIOUS CHAIN Steps 3-5 happen silently in a single agent turn Developer sees normal-looking output — no visible indication of compromise
5. Untrusted Content
Remote Fetch Risks
Skills that fetch content from remote URLs without validation. Enables TOCTOU (time-of-check-time-of-use) attacks where content changes between review and execution.
SEVERITY: MEDIUM
6. Toxic Flows
Innocent Steps, Harmful Result
Each individual step looks legitimate. But chained together, they form a destructive sequence. Hardest to detect because no single action triggers alerts.
SEVERITY: MEDIUM
Mitigation Strategies
Defense-in-Depth Approach
RiskMitigationToolStatus
Prompt injectionStatic analysis of skill bodySnyk Agent ScanAvailable
Tool poisoningPermission boundariesPlatform-levelPartial
MalwareSandboxed executionContainer isolationEmerging
Credential leakageEnv var scopingSecret managersPartial
Untrusted contentContent pinning + hashingSRI-style checksProposed
Toxic flowsBehavioral analysisRuntime monitoringResearch
Defense in Depth
Five Layers of Protection
Snyk Scan> Skill Signing> Sandbox> Permissions> Human Review
Snyk Agent Scan
Static analysis tool that scans SKILL.md files for known prompt injection patterns, suspicious tool calls, and credential access.
Skill Signing
Cryptographic verification of skill authorship. Like GPG signing for git commits. Ensures a skill has not been tampered with after publication.
Sandboxed Execution
Run skill scripts in isolated containers with no network access by default. Must explicitly declare network, filesystem, and tool permissions.
Permission Boundaries
Skills declare which MCP tools they need. Agent enforces least-privilege. A deploy skill cannot access email tools. A formatting skill cannot make HTTP requests.
No single layer is sufficient. The combination of static analysis, cryptographic integrity, runtime isolation, permission scoping, and community review creates a layered defense that is much harder to bypass than any individual measure.
05 — Security & Trust · AI Agent Skills · See also: 01 Helicopter · 02 Spec Deep Dive · 03 Progressive Disclosure · 04 Knowledge Stack AI Agent Skills