arXiv — Image-based prompt injection against multimodal LLMs

AI relevance: The paper shows how images can smuggle instructions that override multimodal LLM behavior, a direct risk for vision-enabled agents and RAG pipelines.

  • The authors study image-based prompt injection (IPI) where adversarial text is embedded inside natural images.
  • The pipeline uses segmentation-based region selection to place hidden instructions in visually plausible regions.
  • Adaptive font scaling and background-aware rendering help keep the injected text stealthy to humans while readable to the model.
  • Evaluation uses the COCO dataset with GPT‑4‑turbo as the target MLLM.
  • Across 12 adversarial prompt strategies, the strongest setup reached up to 64% attack success under stealth constraints.
  • The attack is framed as a black-box threat, meaning it does not require model internals or gradients.
  • The paper argues IPI is a practical threat for vision-enabled apps and calls for dedicated defenses.

Why it matters

  • Vision-enabled agents can be hijacked by untrusted images in emails, tickets, or web pages.
  • Stealthy injections reduce the chance of human review catching malicious prompts.
  • Black-box feasibility means realistic attacker access without insider knowledge.

What to do

  • Gate image inputs: treat images as untrusted prompt material and strip/blur embedded text when possible.
  • Instrument vision pipelines: log OCR outputs and flag anomalous instruction-like strings.
  • Red-team multimodal flows: test image ingestion paths with adversarial prompt overlays.

Sources