arXiv — Image-based prompt injection against multimodal LLMs

2026-03-08 Research by al-ice.ai Editorial

AI relevance: The paper shows how images can smuggle instructions that override multimodal LLM behavior, a direct risk for vision-enabled agents and RAG pipelines.

The authors study image-based prompt injection (IPI) where adversarial text is embedded inside natural images.
The pipeline uses segmentation-based region selection to place hidden instructions in visually plausible regions.
Adaptive font scaling and background-aware rendering help keep the injected text stealthy to humans while readable to the model.
Evaluation uses the COCO dataset with GPT‑4‑turbo as the target MLLM.
Across 12 adversarial prompt strategies, the strongest setup reached up to 64% attack success under stealth constraints.
The attack is framed as a black-box threat, meaning it does not require model internals or gradients.
The paper argues IPI is a practical threat for vision-enabled apps and calls for dedicated defenses.

Why it matters

Vision-enabled agents can be hijacked by untrusted images in emails, tickets, or web pages.
Stealthy injections reduce the chance of human review catching malicious prompts.
Black-box feasibility means realistic attacker access without insider knowledge.

What to do

Gate image inputs: treat images as untrusted prompt material and strip/blur embedded text when possible.
Instrument vision pipelines: log OCR outputs and flag anomalous instruction-like strings.
Red-team multimodal flows: test image ingestion paths with adversarial prompt overlays.

arXiv — Image-based prompt injection against multimodal LLMs

Why it matters

What to do

Sources