Christian Schneider — From LLM to agentic AI: how agents amplify prompt injection into kill chains
• Category: Security
AI relevance: Agentic AI systems with tool access, persistent memory, and multi-agent communication transform prompt injection from a contained chatbot trick into orchestrated attack chains that can exfiltrate data, execute code, and propagate across connected systems at machine speed.
- The amplification effect: in a traditional LLM chatbot, prompt injection affects a single inference call with limited blast radius. In agentic systems, the same injection can hijack planning, execute privileged tool calls with inherited user permissions, persist malicious instructions in memory, and propagate to peer agents.
- OWASP ASI01 — Agent Goal Hijack: the 2026 OWASP Top 10 for Agentic Applications introduces ASI01, recognizing that manipulated input doesn't just alter one output — it can redirect goals, planning, and multi-step behavior across the entire agent workflow.
- Promptware Kill Chain applied: Schneider maps the Nassi/Schneier five-stage model (initial access → privilege escalation → persistence → lateral movement → actions on objective) to real-world agentic attacks, demonstrating that input-filtering defenses alone only address stage one.
- Indirect injection as dominant vector: for agentic systems, indirect prompt injection — malicious instructions embedded in documents, emails, web pages, calendar invites, code repos, and API responses — is more dangerous than direct injection because the agent cannot reliably distinguish data from instructions. OpenAI acknowledged in December 2025 that this "is unlikely to ever be fully solved."
- MCP attack surface: the Model Context Protocol standardizes tool connectivity but introduces new vectors: tool description injection (poisoning tool metadata to influence agent behavior), tool poisoning via rug-pull updates, cross-origin tool access, and lack of granular permission scoping in many MCP server implementations.
- Memory poisoning for persistence: agents with persistent memory (conversation history, RAG stores, embedding databases) are vulnerable to payloads that survive session boundaries — a single successful injection can corrupt the agent's knowledge base and influence future sessions indefinitely.
- Multi-agent lateral movement: in architectures where agents delegate tasks to other agents, a compromised agent can pass tainted instructions downstream — the receiving agent trusts the delegating agent, creating a confused-deputy chain.
- Defense-in-depth architecture: Schneider outlines layered controls: input validation on all data sources (not just user prompts), goal-lock mechanisms that constrain agent planning to authorized objectives, tool sandboxing with minimal privileges, output validation before tool execution, and strategic human-in-the-loop approval for high-impact actions.
Why it matters
- This is the clearest practitioner-oriented synthesis yet of how the shift from chatbots to agents changes the prompt injection threat model — essential reading for teams deploying MCP-connected agents in production.
- The article bridges the OWASP Agentic Top 10, the Promptware Kill Chain research, and practical defense patterns into a single actionable framework, connecting academic threat models to engineering decisions.
- The MCP attack surface analysis is particularly timely as MCP adoption accelerates — tool description injection and rug-pull attacks are underappreciated vectors that most organizations haven't considered in their MCP deployments.
What to do
- Treat all agent data sources as untrusted input: emails, documents, web pages, API responses, and tool outputs should all pass through validation before entering the agent's reasoning context.
- Implement goal-lock mechanisms: constrain agent planning to a declared objective and flag or block deviations — if the agent's plan suddenly includes an undeclared tool call or data destination, halt and alert.
- Sandbox MCP tools with least privilege: each tool should have the minimum permissions required; authenticate tool servers, pin tool descriptions to prevent rug-pull updates, and validate all tool response schemas.
- Audit persistent memory: implement provenance tracking on memory writes, periodically validate stored content against integrity baselines, and scope memory access so injected content cannot influence unrelated sessions.
- Add human-in-the-loop for high-impact actions: any tool call that sends data externally, modifies production systems, or executes code should require explicit human approval until trust is established through auditing.