Infrastructure
Pattern 16 of 26
Sandboxes
Letting agents act without breaking things
If an agent can write and execute code on your machine, it can do essentially anything your machine can do. Sandboxes solve this by giving the agent a fully isolated environment: its own filesystem, its own network, its own process space. It can run arbitrary code, browse the web, modify files, and crash itself repeatedly. Your production systems stay untouched. The MicroVM spins up in 150 milliseconds and disappears when the task finishes.
Why it matters
The threat model for an agent running in a MicroVM that spins up and disappears is genuinely different from one running directly on your infrastructure. That is not an abstract point. It is the difference between a crash that ends a session and a crash that corrupts a production database.
Deep Dive
A sandbox is an isolated execution environment where an agent can write and run code, browse the web, modify files, and take actions without any chance of affecting production systems. This has shifted from a security nicety to a fundamental architectural requirement as agents gain more real autonomy. The question has moved past whether to sandbox autonomous agents. It is now which technology to use and how to design the boundary between the sandbox and anything that actually matters.
E2B provides MicroVM-based sandboxes that spin up in 150 milliseconds and give the agent a full Linux environment: filesystem, network access, code execution, and a browser. Daytona achieves sub-90ms cold starts for development environment use cases. Browserbase has processed over 50 million browser sessions specifically for web-browsing agents. The Fault-Tolerant Sandboxing paper from December 2025 introduced a transactional model where sandbox operations can be committed or rolled back like database transactions, which matters for agents making multi-step changes that might need to be undone.
The design question that comes up in every production architecture is where the sandbox boundary actually sits. An agent confined entirely to its sandbox cannot reach production databases or deploy to production infrastructure, which limits what it can accomplish. An agent that can reach through the sandbox to production systems needs careful permission scoping for everything on the other side. Most teams end up with a tiered model: the agent runs in a sandbox, has read access to specific production data sources, and can only write to production through well-defined validated interfaces that exist outside the sandbox and cannot be bypassed from within it.