Adversa AI — GuardFall: Decades-Old Bash Tricks Bypass Shell Guards in 10 of 11 AI Coding Agents
AI relevance: AI coding agents run shell commands with full developer authority — SSH keys, cloud credentials, everything in $HOME. When pattern-based shell guards fail against 30-year-old Bash quoting tricks, poisoned READMEs and Makefiles become weaponized supply-chain attack vectors.
- Adversa AI discovered a structural security flaw across multiple open-source AI coding agents: pattern-based shell guards that inspect raw command text cannot model Bash's expansion behavior, so they provide confidence without protection.
- The research, dubbed GuardFall, documents how decades-old shell bypasses — quote removal, $IFS spacing, command substitution, base64-to-sh, and destructive argv flags — systematically defeat the guards of 10 out of 11 popular open-source agents tested, including Hermes, OpenCode, Roo-code, and Goose.
- The trigger: a guard inspects raw text, but Bash expands, unquotes, and rewrites text before execution. When an agent processes untrusted content (a poisoned README, Makefile, or MCP server output), injected commands pass all execution filters because the guard sees different text than what Bash actually runs.
- Class E bypasses — alternative argv shapes for the same destructive effect — survive even the strongest tokenized guards because per-flag reasoning requires knowing which flag combinations flip a binary from benign to destructive for each specific command.
- Only Continue defended against the structural majority of bypass cases: "Of 21 bypass cases submitted to the evaluator, 0 reach allowedWithoutPermission, and all 12 canonical-destructive cases are correctly downgraded."
- The attack chain: an engineer uses a vulnerable agent to read a poisoned README or Makefile from a malicious repository → the agent is tricked into silently executing commands that exfiltrate AWS credentials or wipe dev environments → especially dangerous in CI pipelines where auto-yes modes are default.
- While direct malicious prompts are often refused by frontier models, an agent ingesting operational context (like a poisoned README) can trick models into emitting injected commands as routine tasks. The researchers used Claude Sonnet 4.6 in live runs.
- The core problem: false confidence from string-matching guards leads teams to disable human-in-the-loop checks and enable auto-mode, creating the exact conditions for supply-chain compromise.
Why it matters
This isn't a novel exploit technique — it's a 30-year-old class of shell bypasses being applied to a new attack surface. The fact that 10 of 11 agents fail suggests the industry is shipping pattern-matching guards that create an illusion of safety while leaving the structural vulnerability wide open. With AI coding agents running in developer environments with full credential access, the blast radius of a single poisoned dependency is catastrophic.
What to do
- If you're running AI coding agents in production, audit whether your guard is pattern-based (string matching) or tokenized. Pattern-based guards are definitively broken against this class.
- Disable auto-execute mode for any agent processing untrusted content. Human-in-the-loop approval is the only reliable defense until agents implement proper shell expansion modeling.
- Consider Continue's architecture as a reference design — it's the only agent that closed the structural majority of bypass surface in default IDE mode.
- For CI/CD pipelines: treat AI agent execution with the same caution as arbitrary code execution from untrusted sources. Sandbox aggressively, rotate credentials frequently, and never run agents with production credentials in disposable environments.
- Monitor for Class E bypass attempts specifically — alternative argv shapes that survive even tokenized guards require per-binary flag reasoning that most agents don't implement.