Security & Trust — Detail · AI Agent Skills

The ClawHavoc Incident

First Major Supply Chain Attack on AI Agent Skills

Security researchers discovered 1,184 malicious skills on ClawHub, the community registry for OpenClaw skills. The attack, dubbed ClawHavoc, demonstrated full attack chains from skill installation to data exfiltration.

Malicious skills mimicked popular legitimate skills with typosquatted names. Thousands of developers installed them before detection. The incident exposed fundamental gaps in the trust model for shared AI agent skills.

Attack timeline:

Attackers published skills with names similar to popular ones
Skills contained hidden instructions in markdown comments
Agent loaded the skill and followed hidden instructions
MCP tools were called with malicious parameters
Credentials and code were exfiltrated via HTTP requests
Detection took weeks due to no scanning infrastructure

Attack Surface

Why Skills Are Vulnerable

Skills are executable instructions that an AI agent follows with the same authority as user commands. Community registries are open-submission. There is no built-in sandboxing or permission scoping in most implementations.

Executable Open Registry No Sandbox Trust Gap

Scale of Risk

By the Numbers

1,184

malicious skills found

6

distinct attack categories

1000s

of downloads before detection

1. Prompt Injection

Hidden Instructions

Malicious instructions embedded in skill body (often in HTML comments or zero-width characters). Agent executes attacker commands believing they are part of the skill procedure.

SEVERITY: CRITICAL

2. Tool Poisoning

Legitimate Tools, Malicious Use

Skills misuse MCP tools with malicious parameters. The tool call itself is legitimate (e.g., HTTP request), but the destination or payload is attacker-controlled.

SEVERITY: CRITICAL

3. Malware Delivery

Scripts That Bite

Scripts in the skills scripts/ directory that download and execute malware. Leverages the agent's system access and user permissions to install payloads.

SEVERITY: HIGH

4. Credential Leakage

Secrets Stolen Silently

Skills that log, transmit, or expose environment variables and API keys. Exfiltration often hidden in seemingly innocent tool calls or network requests.

SEVERITY: HIGH

Attack Flow

How a Malicious Skill Executes

5. Untrusted Content

Remote Fetch Risks

Skills that fetch content from remote URLs without validation. Enables TOCTOU (time-of-check-time-of-use) attacks where content changes between review and execution.

SEVERITY: MEDIUM

6. Toxic Flows

Innocent Steps, Harmful Result

Each individual step looks legitimate. But chained together, they form a destructive sequence. Hardest to detect because no single action triggers alerts.

SEVERITY: MEDIUM

Mitigation Strategies

Defense-in-Depth Approach

Risk	Mitigation	Tool	Status
Prompt injection	Static analysis of skill body	Snyk Agent Scan	Available
Tool poisoning	Permission boundaries	Platform-level	Partial
Malware	Sandboxed execution	Container isolation	Emerging
Credential leakage	Env var scoping	Secret managers	Partial
Untrusted content	Content pinning + hashing	SRI-style checks	Proposed
Toxic flows	Behavioral analysis	Runtime monitoring	Research

Defense in Depth

Five Layers of Protection

Snyk Scan> Skill Signing> Sandbox> Permissions> Human Review

Snyk Agent Scan

Static analysis tool that scans SKILL.md files for known prompt injection patterns, suspicious tool calls, and credential access.

Skill Signing

Cryptographic verification of skill authorship. Like GPG signing for git commits. Ensures a skill has not been tampered with after publication.

Sandboxed Execution

Run skill scripts in isolated containers with no network access by default. Must explicitly declare network, filesystem, and tool permissions.

Permission Boundaries

Skills declare which MCP tools they need. Agent enforces least-privilege. A deploy skill cannot access email tools. A formatting skill cannot make HTTP requests.

No single layer is sufficient. The combination of static analysis, cryptographic integrity, runtime isolation, permission scoping, and community review creates a layered defense that is much harder to bypass than any individual measure.