arXiv — Image-based prompt injection against multimodal LLMs
AI relevance: The paper shows how images can smuggle instructions that override multimodal LLM behavior, a direct risk for vision-enabled agents and RAG pipelines.
- The authors study image-based prompt injection (IPI) where adversarial text is embedded inside natural images.
- The pipeline uses segmentation-based region selection to place hidden instructions in visually plausible regions.
- Adaptive font scaling and background-aware rendering help keep the injected text stealthy to humans while readable to the model.
- Evaluation uses the COCO dataset with GPT‑4‑turbo as the target MLLM.
- Across 12 adversarial prompt strategies, the strongest setup reached up to 64% attack success under stealth constraints.
- The attack is framed as a black-box threat, meaning it does not require model internals or gradients.
- The paper argues IPI is a practical threat for vision-enabled apps and calls for dedicated defenses.
Why it matters
- Vision-enabled agents can be hijacked by untrusted images in emails, tickets, or web pages.
- Stealthy injections reduce the chance of human review catching malicious prompts.
- Black-box feasibility means realistic attacker access without insider knowledge.
What to do
- Gate image inputs: treat images as untrusted prompt material and strip/blur embedded text when possible.
- Instrument vision pipelines: log OCR outputs and flag anomalous instruction-like strings.
- Red-team multimodal flows: test image ingestion paths with adversarial prompt overlays.