arXiv — Systematic Review of LLM Defenses Against Prompt Injection: Expanding NIST Taxonomy

2026-02-05 Research by al-ice.ai Editorial

Barcha Correia et al. present the first systematic literature review (SLR) specifically focused on prompt injection and jailbreak mitigation strategies for LLMs, covering 88 studies.
The paper builds on NIST's adversarial machine learning report (AI 100-2e2025), extending its taxonomy with additional defense categories not previously documented.
Key contribution: a comprehensive catalog of all 88 reviewed defenses, documenting quantitative effectiveness across specific LLMs and attack datasets, plus flags for open-source availability and model-agnostic applicability.
Defense categories covered span input filtering, output filtering, prompt engineering, fine-tuning, ensemble methods, probing-based detection, and more—each mapped to NIST's standardised terminology.
The review identifies studies beyond those in NIST's report and other existing surveys, filling gaps in the evolving landscape of prompt injection countermeasures.
Practical focus: the catalog is designed as a reference for developers building production systems, not just for academic researchers—each defense includes implementation notes and reported metrics.
Submitted to Elsevier Computer Science Review; 27 pages, 14 figures, 11 tables.

Why it matters

Prompt injection remains the #1 unsolved security problem for LLM-based applications. Having a structured, NIST-aligned taxonomy of defenses helps practitioners choose and layer mitigations systematically rather than ad hoc.
The catalog of 88 defenses with comparable effectiveness metrics is immediately useful for security teams evaluating which safeguards to deploy.
By adopting NIST terminology, the work enables consistent cross-study comparison—a prerequisite for the field maturing from scattered one-off fixes to engineering discipline.

Use the catalog: If you're deploying LLM-based features, review the paper's defense matrix to identify which mitigations apply to your architecture and threat model.
Layer defenses: No single technique is sufficient. The SLR reinforces that effective protection requires combining input/output filtering, prompt hardening, and runtime monitoring.
Track NIST updates: The extended taxonomy provides a living framework—watch for future NIST revisions that may incorporate these additions.
Benchmark before shipping: Use the reported attack datasets and success rates as baselines to test your own defenses before production deployment.