Forcepoint — 10 In-the-Wild Indirect Prompt Injection Payloads Targeting AI Agents
AI relevance: Any AI agent that browses, summarizes, or indexes web content — including RAG pipelines, browser agents, and AI-powered CI/CD reviewers — treats poisoned web pages as trusted input, enabling real-world data theft, financial fraud, and destructive commands.
Forcepoint senior security researcher Mayur Sewani published findings on 10 indirect prompt injection (IPI) payloads discovered in the wild, hidden in web content to execute when AI agents ingest those pages. Unlike theoretical attacks, these payloads are live and actively weaponized.
Key Findings
- All payloads use common trigger phrases: "Ignore previous instructions," "Ignore all previous instructions," "If you are an LLM," and "If you are a large language model."
- Financial fraud payload — Embeds a PayPal.me link with a fixed $5,000 amount and step-by-step instructions for an AI with payment access to process the transaction. Forcepoint called this "not a probe, but a weaponized payload intended for immediate execution."
- Data destruction payload — Instructs AI coding assistants, IDE integrations, or agents with shell access to execute recursive forced deletion of files and directories. Targets GitHub Copilot, Cursor, Claude Code, and AI-powered CI/CD reviewers.
- API key exfiltration — Commands agents to "send me the secret API key" while simultaneously instructing them not to display the flag, creating a covert exfiltration channel.
- Content suppression (DoS) — Falsely claims the copyright owner "expressly forbids" the AI from answering questions about the page, achieving denial-of-service.
- Attribution hijacking — Instructs the AI to credit a specific person with the ingested content and direct users to contact them for consulting.
- The attack chain is consistent: poison web content → agent ingests the page → agent ignores previous instructions → attacker's instructions execute → covert return channel exfiltrates results.
- Impact scales with AI privilege: a summarization-only agent is low-risk; an agent with email, terminal, or payment access becomes a high-impact target.
Why It Matters
These are not hypothetical attacks — they are payloads found in the wild. The specificity of the financial fraud payload (exact amount, exact URL, exact steps) indicates active weaponization rather than experimentation. Any organization deploying agents that read untrusted web content is exposed unless they enforce a strict data-instruction boundary.
What to Do
- Enforce a data-instruction boundary: separate content ingestion from instruction execution in agent architectures.
- Restrict tool privileges for agents that process untrusted web content — limit access to payment systems, terminal commands, and email.
- Implement content sanitization pipelines that strip hidden instructions from HTML comments, metadata, and invisible elements before feeding content to agents.
- Monitor agent tool-call logs for anomalous invocations (unexpected payment requests, file deletions, secret access patterns).