CSO: Treat AI Models as Untrusted Components, Google and UCSD Researchers Argue

AI relevance: When AI agents connect to enterprise tools, memory, APIs, and browsers, the model itself must be treated as an untrusted process — security properties need enforcement at the system wrapper layer, not inside the model's prompt context.

Key Points

  • Researchers at Google, UCSD, UW-Madison, and other institutions published a paper arguing that the AI model powering an agent should be treated as an untrusted component, analogous to how an operating system treats user processes.
  • The paper analyzed eleven real-world attacks on AI agents, including data exfiltration from the ChatGPT macOS app, a Claude Code exfiltration flaw, a Microsoft Copilot vulnerability, and AgentFlayer targeting Cursor via malicious Jira tickets.
  • Every one of the eleven attacks violated the secure information flow principle; most also violated the least privilege principle.
  • The authors distilled five principles from decades of systems security: least privilege, tamper resistance of the trusted computing base, complete mediation, secure information flow, and accounting for the human as a weak link.
  • They explicitly rejected stacking ML guardrails as "true defense-in-depth," noting guard models often share the same statistical failure modes as the primary agents they monitor.
  • Three open research problems identified: separating instructions from data in the token stream, verifiable least-privilege policy generation from natural-language specs, and information flow control through models.
  • The paper also references a complementary study (arXiv:2605.17380) proposing an "Agentic Detection and Response" (ADR) framework, arguing traditional EDR platforms cannot inspect agent reasoning flows, prompt chains, or dynamic tool execution.

Why It Matters

For the past two years, many enterprises have assumed that better models, alignment techniques, and prompt defenses would eventually make AI systems secure enough for production. This paper argues that approach is fundamentally misaligned with how agents operate — they combine reasoning, autonomy, memory, and tool execution in a single layer, making them closer to operating environments than conventional applications. Prompt injection stops being a content-manipulation issue and becomes a workflow-execution and systems-integrity problem that can influence downstream actions across interconnected enterprise systems.

What to Do

  • Shift from model-centric to system-centric security: enforce runtime isolation, containment boundaries, and least-privilege execution around agents.
  • Implement workflow observability controls that track agent reasoning, tool invocations, and memory interactions.
  • Don't rely on stacked ML guardrails as your sole defense — they share failure modes with the agents they monitor.
  • Explore agent-specific detection frameworks (ADR) rather than retrofitting traditional EDR for AI workloads.

Sources