arXiv — Prompt injection vs LLM rankers
AI relevance: Any AI system that ranks or re-ranks content (search, retrieval, recommendations) can be steered if the candidate items embed prompt injections.
- The paper evaluates how prompt injections embedded in candidate documents can hijack LLM-based ranking decisions.
- Authors test three ranking paradigms: pairwise, listwise, and setwise rankers.
- Two injection styles are evaluated: decision objective hijacking and decision criteria hijacking.
- They measure both attack success rate and downstream ranking quality degradation (nDCG@10).
- The study compares vulnerability across model families, architectures, and position sensitivity.
- One key result: encoder–decoder models show stronger inherent resistance to jailbreak-style injections.
- Code and extra experiments are released for reproducibility and follow-on testing.
Why it matters
- LLM rankers are increasingly used in search and RAG pipelines, making ranking integrity a core security property.
- Attackers can exploit prompt injections to surface malicious content or hide relevant results in AI-driven ranking stacks.
What to do
- Red-team ranking pipelines with injected candidates before production use.
- Track architecture choices (encoder–decoder vs decoder-only) as a security control, not just a latency tradeoff.
- Filter or neutralize untrusted instructions in candidate documents before LLM scoring.