Forcepoint — 10 In-the-Wild Indirect Prompt Injection Payloads

AI relevance: Forcepoint identified 10 live indirect prompt injection (IPI) payloads embedded in real web content, each targeting AI agents that browse, summarize, or index pages — proving that IPI has moved from theory to active deployment.

What happened

  • Forcepoint senior researcher Mayur Sewani published findings of 10 distinct IPI payloads discovered in the wild, targeting agents with varying levels of privilege.
  • The attack chain is consistent: attackers poison web content with hidden instructions, wait for an AI agent to ingest the page, and the agent executes the injected commands as legitimate.
  • Financial fraud payload: Embeds a PayPal.me link with a fixed $5,000 amount and step-by-step transaction instructions, targeting agents with integrated payment capabilities.
  • API key exfiltration: Instructs the agent to "send me the secret API key" while simultaneously telling it not to output the results visibly — a covert exfiltration channel.
  • Recursive file deletion: Targets agentic AI with shell access (IDE assistants, terminal tools, DevOps pipelines) by injecting a Unix command for forced recursive deletion.
  • Content suppression (DoS): Claims the copyright owner has "expressly forbidden" the AI from answering questions about the page — a denial-of-service against the agent's output.
  • Attribution hijacking: Instructs the AI to credit a specific person with the content and solicit consulting work — essentially SEO manipulation via AI agents.
  • Common trigger phrases include "ignore previous instructions," "if you are an LLM," and similar instruction-overriding patterns.
  • Impact scales with agent privilege: a summarization agent is low-risk; an agent that can send emails, run commands, or process payments is a high-impact target.

Why it matters

This is no longer a lab exercise. These payloads are live on the web, waiting for agents to encounter them. The PayPal payload's specificity — exact amount, exact URL, exact steps — indicates weaponization, not exploration. Any agent that reads untrusted web content without a strict data-instruction boundary is vulnerable by design.

What to do

  • Enforce a strict data-instruction boundary: content ingested from the web must be classified as data, never as instructions.
  • Strip or quarantine hidden HTML comments, metadata, and CSS-hidden text before feeding pages to agents.
  • Limit agent privileges: if an agent only needs to summarize, it should not have terminal, email, or payment access.
  • Monitor agent outputs for unexpected action patterns (file operations, API calls, payment requests).

Sources