Unit 42 — Web-based indirect prompt injection observed in the wild

AI relevance: AI agents that browse or summarize web content can be coerced by hidden instructions embedded in pages, turning routine ingestion into attacker-controlled actions.

  • Unit 42 reports web-based indirect prompt injection (IDPI) observed in real-world telemetry, not just lab PoCs.
  • Attackers embed hidden instructions in webpages that are later consumed by LLM-powered tools for summarization or analysis.
  • Observed intents include ad-review evasion, SEO manipulation promoting phishing, and system prompt leakage.
  • Other outcomes noted: data destruction, denial of service, unauthorized transactions, and sensitive data leakage.
  • The team cataloged 22 distinct techniques for crafting web-based IDPI payloads.
  • Risk scales with agent privileges: the more tools an agent can use, the higher the blast radius.

Security impact

Indirect prompt injection is the “supply chain” of the prompt world. Instead of attacking the model directly, adversaries place malicious instructions in web content, documents, or APIs the agent consumes. When a web-browsing agent fetches that content, the injection is executed inside the model context, hijacking the agent’s goals. This is particularly dangerous because the attack vector looks like normal data ingestion.

For AI systems with tool access, indirect prompt injection can trigger data exfiltration, tool misuse, and unauthorized actions. In enterprise settings, it can leak CRM data, internal emails, or cloud metadata. The risk scales with autonomy: the more actions an agent can take without human review, the more power the injection has.

Mitigation strategy

Implement content sanitization and strict separation between untrusted content and system instructions. Use allowlisted tools with explicit schemas, apply output filters, and require human confirmation for high-impact actions. Treat web content as untrusted code and enforce a policy that the model must ignore instructions from external documents.

Why it matters

  • Web content becomes a prompt delivery channel for agents used in browsing, support, and automation pipelines.
  • Indirect attacks bypass traditional “user input” protections because they arrive via normal web ingestion.
  • Defenders need web-scale filtering and intent detection to separate benign content from injected instructions.

What to do

  • Isolate browsing agents: run them with minimal tool access and no direct payment or admin actions.
  • Harden ingestion: strip or neutralize hidden text, comments, and metadata before LLM processing.
  • Require confirmations: gate high-risk actions (transactions, data exports) behind explicit user approval.
  • Monitor for IDPI: log and flag prompt-like patterns in retrieved web content.

Sources