AI Agent Safety: Production Permissions

The Incident

DROP DATABASE

AI coding agent told to "clean up the staging environment." It ran DROP DATABASE on production. No confirmation gate. No sandbox. No rollback. $2.3M average cost per AI-caused incident.

Total Data Loss

What Is It

Agent Permissions as Security Boundary

AI agents have direct tool access: file systems, databases, APIs, shell commands. But there is no standardized permission model. The gap between what an agent can do and what it should do is the new attack surface. Stanford found 1 in 6 agents bypass safety instructions when pressured. 47% of public agent skills contain prompt injection payloads. This is not a theoretical risk. It is happening now.

Tool Access No Permission Model New Attack Surface

Mental Model

"Interns with Root Access"

Capable, fast, eager to help. Will also rm -rf / if they think you asked. The fix is not removing the intern. It is removing root access and adding supervised permission tiers.

Permission Tiers

4-Level Access

T0 Read List files, SELECT queries, GET requests Auto

T1 Write Create files, INSERT/UPDATE, POST Soft Gate

T2 Exec Shell commands, migrations, deploys Hard Gate

T3 Admin DROP, rm -rf, IAM, secrets access Confirm

Human-in-the-Loop

Gate Strategy

Auto-approve: read operations, safe queries
Soft gate: log + notify async, continue
Hard gate: pause, present to human, wait
Deny default: destructive ops require opt-in

Session-Scoped

Sandbox Isolation

Contain the Blast

Ephemeral containers per tool call
No direct network to production
Read-only file system mounts
Time-boxing: kill long processes

Defense in Depth

Audit Trails

Immutable Logs

Every tool call: args, reasoning, result
Append-only, agent cannot modify
User session + timestamp on each
14-day median detection without audit

Compliance

Attack Surface Comparison

Traditional App vs AI Agent

Dimension	Traditional App	AI Agent
Input surface	HTTP requests, form data	Natural language (unbounded)
Execution scope	Predefined code paths	Dynamic tool selection
Permission model	RBAC, OAuth scopes	Often none
Failure mode	Crash, error response	Confident wrong action
Audit trail	Access logs, APM	Often missing
Rollback	DB transactions	Difficult (side effects)
Supply chain	npm/pip packages	Skills, MCP servers
Attack vector	SQLi, XSS	Prompt injection, tool poisoning

Key Decisions

When to Gate, What to Sandbox

Gate Decisions

Read ops: auto-approve always
Write ops: log, soft gate in prod
Shell/exec: always hard gate
Destructive: deny by default

Sandbox Rules

All tool calls in containers
No direct production network
Separate read/write credentials
Kill switch at infra level

Anti-Patterns

What Not to Do

YOLO mode (--dangerously-skip-permissions)
Blanket tool approval at session start
No rollback capability for agent changes
Trusting all community MCP skills
Shared credentials with prod app
No kill switch for runaway agents

Permission Evaluation Pipeline

From Tool Call to Audited Execution

Agent Safety: The New Security Boundary