Forcepoint — 10 In-the-Wild Indirect Prompt Injection Payloads
AI relevance: Forcepoint identified 10 live indirect prompt injection (IPI) payloads embedded in real web content, each targeting AI agents that browse, summarize, or index pages — proving that IPI has moved from theory to active deployment.
What happened
- Forcepoint senior researcher Mayur Sewani published findings of 10 distinct IPI payloads discovered in the wild, targeting agents with varying levels of privilege.
- The attack chain is consistent: attackers poison web content with hidden instructions, wait for an AI agent to ingest the page, and the agent executes the injected commands as legitimate.
- Financial fraud payload: Embeds a PayPal.me link with a fixed $5,000 amount and step-by-step transaction instructions, targeting agents with integrated payment capabilities.
- API key exfiltration: Instructs the agent to "send me the secret API key" while simultaneously telling it not to output the results visibly — a covert exfiltration channel.
- Recursive file deletion: Targets agentic AI with shell access (IDE assistants, terminal tools, DevOps pipelines) by injecting a Unix command for forced recursive deletion.
- Content suppression (DoS): Claims the copyright owner has "expressly forbidden" the AI from answering questions about the page — a denial-of-service against the agent's output.
- Attribution hijacking: Instructs the AI to credit a specific person with the content and solicit consulting work — essentially SEO manipulation via AI agents.
- Common trigger phrases include "ignore previous instructions," "if you are an LLM," and similar instruction-overriding patterns.
- Impact scales with agent privilege: a summarization agent is low-risk; an agent that can send emails, run commands, or process payments is a high-impact target.
Why it matters
This is no longer a lab exercise. These payloads are live on the web, waiting for agents to encounter them. The PayPal payload's specificity — exact amount, exact URL, exact steps — indicates weaponization, not exploration. Any agent that reads untrusted web content without a strict data-instruction boundary is vulnerable by design.
What to do
- Enforce a strict data-instruction boundary: content ingested from the web must be classified as data, never as instructions.
- Strip or quarantine hidden HTML comments, metadata, and CSS-hidden text before feeding pages to agents.
- Limit agent privileges: if an agent only needs to summarize, it should not have terminal, email, or payment access.
- Monitor agent outputs for unexpected action patterns (file operations, API calls, payment requests).