01 — Helicopter View · AI Agent Safety

Agent Safety: The New Security Boundary

Production permissions for AI agents with tool access

6Risk Types
4Permission Tiers
1-in-6Bypass Rate
The Incident
DROP DATABASE
AI coding agent told to "clean up the staging environment." It ran DROP DATABASE on production. No confirmation gate. No sandbox. No rollback. $2.3M average cost per AI-caused incident.
Total Data Loss
What Is It
Agent Permissions as Security Boundary
AI agents have direct tool access: file systems, databases, APIs, shell commands. But there is no standardized permission model. The gap between what an agent can do and what it should do is the new attack surface. Stanford found 1 in 6 agents bypass safety instructions when pressured. 47% of public agent skills contain prompt injection payloads. This is not a theoretical risk. It is happening now.
Tool Access No Permission Model New Attack Surface
Mental Model
"Interns with Root Access"
Capable, fast, eager to help. Will also rm -rf / if they think you asked. The fix is not removing the intern. It is removing root access and adding supervised permission tiers.
Permission Tiers
4-Level Access
T0 Read List files, SELECT queries, GET requests Auto
T1 Write Create files, INSERT/UPDATE, POST Soft Gate
T2 Exec Shell commands, migrations, deploys Hard Gate
T3 Admin DROP, rm -rf, IAM, secrets access Confirm
Human-in-the-Loop
Gate Strategy
Session-Scoped
Sandbox Isolation
Contain the Blast
Defense in Depth
Audit Trails
Immutable Logs
Compliance
Attack Surface Comparison
Traditional App vs AI Agent
DimensionTraditional AppAI Agent
Input surfaceHTTP requests, form dataNatural language (unbounded)
Execution scopePredefined code pathsDynamic tool selection
Permission modelRBAC, OAuth scopesOften none
Failure modeCrash, error responseConfident wrong action
Audit trailAccess logs, APMOften missing
RollbackDB transactionsDifficult (side effects)
Supply chainnpm/pip packagesSkills, MCP servers
Attack vectorSQLi, XSSPrompt injection, tool poisoning
Key Decisions
When to Gate, What to Sandbox
Gate Decisions
  • Read ops: auto-approve always
  • Write ops: log, soft gate in prod
  • Shell/exec: always hard gate
  • Destructive: deny by default
Sandbox Rules
  • All tool calls in containers
  • No direct production network
  • Separate read/write credentials
  • Kill switch at infra level
Anti-Patterns
What Not to Do
Permission Evaluation Pipeline
From Tool Call to Audited Execution
TOOL CALL Request CLASSIFY Assign Tier T0 AUTO Approve T1 LOG Soft Gate T2 HUMAN Hard Gate T3 CONFIRM Explicit OK? Y N DENY SANDBOX Execute AUDIT LOG Immutable RESULT Return LEGEND T0 Read-Only (auto-approve) T1 Write (soft gate) T2 Execute (hard gate) T3 Admin (explicit confirm) Decision gate PRINCIPLE: LEAST PRIVILEGE | SEPARATE CREDENTIALS | IMMUTABLE AUDIT | KILL SWITCH AT INFRA LEVEL
01 — Helicopter · AI Agent Safety · sangampandey.info Agent Safety