ArXiv — Indirect Prompt Injection in the Wild: 15.3K Instances Across 24.8M Hosts

AI relevance: The first large-scale empirical study measuring real-world indirect prompt injection prevalence on the public web quantifies how often AI systems actually comply with instructions embedded in webpages.

Khodayari et al. analyzed 1.2 billion URLs across 24.8 million hosts, identifying 15,300 validated injection instances on 11,700 pages — the most comprehensive measurement to date of IPI as a deployed threat rather than a theoretical attack.

  • Scale and recurrence: A small number of recurring injection templates account for most cases, suggesting automated or semi-automated deployment rather than hand-crafted attacks.
  • Machine-targeted: Roughly 70% of injections live in non-rendered HTML — headers, comments, metadata — invisible to human visitors but fully consumed by web-browsing AI agents.
  • Stealth via rendering: Many visible cases use CSS hiding, zero-width characters, or off-screen positioning to evade casual detection while remaining legible to models.
  • Objectives span four categories: disruptive prompts (denial-of-service behavior), reputation manipulation (SEO-style injection for AI summaries), content-protection directives (anti-crawler instructions), and AI-bot detection.
  • Targets include crawlers, search pipelines, customer-support agents, and hiring workflows — any system that ingests web content and acts on it.
  • Model compliance varies: Controlled experiments across 13 models and 5,200 test cases showed compliance rates up to 8% for smaller models on plain-text inputs, dropping significantly when structured HTML representations preserved structural cues.
  • Structured inputs as partial defense: The study found that representing webpages with explicit structure (preserving DOM hierarchy) reduced injection compliance, suggesting that input representation design matters as much as prompt engineering.
  • Heterogeneous ecosystem: Injections target a wide range of downstream systems, not just chat interfaces — the attack surface extends to any LLM pipeline that fetches and processes untrusted web content.

Why it matters

This study complements Google's Common Crawl analysis and Forcepoint's 10-payload report with independent, peer-reviewed methodology. The key contribution is quantitative: we now have baseline numbers for IPI prevalence (15.3K instances across 11.7K pages) and measurable model compliance rates (up to 8%), moving the discussion from "this could happen" to "this is happening at this scale." The finding that structured HTML representations reduce compliance is a practical design insight for teams building web-browsing agents.

What to do

  • If your agents browse or ingest web content, treat all fetched HTML as untrusted input — apply the same sanitization you would to user-submitted prompts.
  • Use structured HTML representations rather than plain-text extraction when feeding web content to models; the study shows this measurably reduces injection compliance.
  • Monitor fetched content for hidden HTML elements (comments, metadata, off-screen text) that could carry injected instructions.
  • Review the paper's methodology to understand which models and representations are most resilient, and align your agent architecture accordingly.

Sources