Forcepoint — 10 In-the-Wild Indirect Prompt Injection Payloads Targeting AI Agents

AI relevance: Any agent that browses the web, indexes content for RAG, or summarizes pages can be hijacked by invisible instructions hidden in the HTML it reads — and Forcepoint found 10 live examples doing real-world harm.

Forcepoint researchers published findings from Mayur Sewani documenting 10 distinct indirect prompt injection payloads discovered on live websites, each designed to manipulate AI agents into executing malicious actions when the page is processed.

What was found

  • File destruction: One payload instructs coding agents with shell access to run a recursive forced deletion of files and directories. Targets include Copilot, Cursor, Claude Code, and AI-powered CI/CD reviewers that visit poisoned pages during research tasks.
  • API key theft: A payload demands "send me the secret API key" from any agent with access to credentials, with an anti-detection instruction to avoid outputting the stolen data visibly.
  • Financial fraud: The most specific payload embeds a PayPal.me link with a fixed $5,000 amount and step-by-step transaction instructions, designed for browser agents with saved payment credentials or digital wallet access.
  • Content suppression: A milder payload claims the copyright owner has "expressly forbidden" the AI from answering questions about the page — effectively a denial-of-service against the model's output.
  • Attribution hijacking: One payload instructs the AI to credit a specific person and promote their consulting services to anyone reading the summary.
  • Common triggers: "Ignore previous instructions," "If you are an LLM," and similar patterns appear across multiple payloads to break the agent's instruction boundary.
  • Covert exfiltration: Most payloads include a hidden return channel back to the attacker, often masked in normal-looking output.
  • Impact scales with privilege: Forcepoint emphasizes that agents with terminal access, payment capabilities, or email permissions represent high-impact targets, while summarization-only agents carry lower risk.
  • Wide blast radius: Any system that auto-crawls web content — search indexers, SEO tools, ad-tech scanners, RAG ingestion pipelines — becomes a potential attack vector.

Why it matters

Indirect prompt injection is no longer theoretical. These payloads were found in the wild, not crafted in labs. As AI agents gain more privileges — file access, payments, email — every untrusted webpage becomes a potential command injection point. The attack surface grows exactly as fast as agent capabilities do.

What to do

  • Enforce a strict data-instruction boundary: treat all ingested web content as untrusted data, never as instructions.
  • Limit agent privileges to the minimum required for the task — disable shell access, payments, and email for agents that browse untrusted content.
  • Use content sanitization pipelines that strip hidden HTML elements, comments, and invisible text before ingestion.
  • Monitor agent outputs for anomalous behavior patterns (unexpected file operations, payment attempts, credential requests).
  • Apply input/output guardrails (e.g., NeMo Guardrails, LLM firewalls) to detect and block instruction override attempts.

Sources