Agent Sandboxes Are the Containment Boundary
Approval prompts slow risk, but they do not contain it. Only OS-level sandboxes limit the blast radius when agents act.
Executive Summary
Approval prompts are a human policy layer. Sandboxes are a technical enforcement layer. The safest agent workflows use both: approvals for intent, sandboxes for containment. Codex CLI and Claude Code both moved in this direction, combining OS-level isolation with configurable approval policies.
Approvals Are a Speed Bump, Not a Guardrail
Approval prompts are a speed bump, not a containment boundary. Developers want to move fast, approvals create fatigue, and routine tasks start to look safe. That is why teams ask for allowlists for repetitive commands, and why vendors are shifting toward sandboxes that make the safe path the default.
Two Layers of Control: Policy and Containment
That's why vendors now split intent from containment. OpenAI documents Codex security as two layers that work together: the sandbox (what is technically possible) and the approval policy (when the agent must ask). In the Codex CLI, the default is an OS-enforced sandbox with network access disabled and write access limited to the workspace, while approval modes decide when the agent pauses for confirmation.
Codex CLI Has Two Dials
- Approval modes: Auto (default), Read-only, Full Access (how often Codex asks).
- Sandbox modes: read-only, workspace-write, danger-full-access (what Codex can do).
In Auto, Codex can read, edit, and run commands inside the working directory, then asks
before touching network or outside paths. Use /status to verify writable roots
and /approvals to adjust the policy mid-session.
What the Sandbox Actually Does
In Codex CLI, the sandbox is not just a UI prompt. It is enforced by the OS. OpenAI documents
macOS enforcement via Seatbelt using sandbox-exec, and Linux enforcement via a
combination of Landlock and seccomp. The same policy applies to every command the agent
spawns, not just the primary process. On Windows, Codex uses the Linux sandbox when running
inside WSL and offers an experimental native sandbox for non-WSL setups.
Landlock vs seccomp (Linux)
Landlock is a Linux Security Module for unprivileged access control, while seccomp is designed for syscall filtering. They solve different problems and work best together: Landlock draws an access boundary, seccomp narrows which syscalls can even be attempted.
Seatbelt (macOS)
Codex relies on macOS Seatbelt via sandbox-exec for OS-level enforcement. This
is the same class of primitives used by other agent tools that need containment without a
full container.
What Other Vendors Are Doing
Anthropic reports a similar conclusion: permission prompts alone do not scale. Claude Code
now uses sandboxing to enforce filesystem and network isolation, with internal usage showing
an 84% reduction in approval prompts. Their write-up describes a sandbox runtime built on
OS-level primitives like Linux bubblewrap and macOS Seatbelt, and the Claude docs position
sandboxing as the default way to reduce approval fatigue while keeping tool access bounded.
Claude Code also gates network access through a proxy with domain allowlists, and enables
sandboxing via the /sandbox command.
Real-World Friction Points
Even with better defaults, teams still hit sharp edges. The community experience is mixed. Some Cursor users ask for allowlists to auto-run safe commands. Some Claude Code users report confusion when sandbox settings appear to be bypassed by fallback flows. OpenAI also notes that Codex sandboxing can fail inside containers that do not support Landlock or seccomp, in which case you should rely on the container boundary and run Codex with a full-access sandbox mode inside that container. The consistent theme is that sandboxes are necessary but not effortless.
Operational Playbook
- Make sandboxing the default: containers or OS sandboxes for all agent runs.
- Keep network off unless needed: use explicit allowlists when supported.
- Expose context clearly: surface active sandbox + approvals via
/statusand/approvals. - Test the sandbox: use Codex
sandboxordebugcommands to validate what is blocked. - Expect edge cases: missing kernel features, CLI fallbacks, and allowlist gaps happen.
How AARSM Helps
AARSM sits between the model and the tools, so a prompt cannot sidestep the sandbox. We enforce boundaries, block risky calls, and log every attempt to cross them.
About This Analysis
This analysis draws on OpenAI Codex documentation, Linux kernel documentation for Landlock and seccomp, Anthropic's sandboxing write-up and Claude Code docs, and community reports from Reddit.
Related Articles
Clicking "Yes" to AI Disaster
Why approval fatigue is the next major security crisis for AI workflows.
MCP Tool Poisoning: When Tool Descriptions Become a Data Exfiltration Path
How tool metadata can inject instructions and leak sensitive data.