arXiv — Systematic Review of LLM Defenses Against Prompt Injection: Expanding NIST Taxonomy

  • Barcha Correia et al. present the first systematic literature review (SLR) specifically focused on prompt injection and jailbreak mitigation strategies for LLMs, covering 88 studies.
  • The paper builds on NIST's adversarial machine learning report (AI 100-2e2025), extending its taxonomy with additional defense categories not previously documented.
  • Key contribution: a comprehensive catalog of all 88 reviewed defenses, documenting quantitative effectiveness across specific LLMs and attack datasets, plus flags for open-source availability and model-agnostic applicability.
  • Defense categories covered span input filtering, output filtering, prompt engineering, fine-tuning, ensemble methods, probing-based detection, and more—each mapped to NIST's standardised terminology.
  • The review identifies studies beyond those in NIST's report and other existing surveys, filling gaps in the evolving landscape of prompt injection countermeasures.
  • Practical focus: the catalog is designed as a reference for developers building production systems, not just for academic researchers—each defense includes implementation notes and reported metrics.
  • Submitted to Elsevier Computer Science Review; 27 pages, 14 figures, 11 tables.

Why it matters

  • Prompt injection remains the #1 unsolved security problem for LLM-based applications. Having a structured, NIST-aligned taxonomy of defenses helps practitioners choose and layer mitigations systematically rather than ad hoc.
  • The catalog of 88 defenses with comparable effectiveness metrics is immediately useful for security teams evaluating which safeguards to deploy.
  • By adopting NIST terminology, the work enables consistent cross-study comparison—a prerequisite for the field maturing from scattered one-off fixes to engineering discipline.

What to do

  • Use the catalog: If you're deploying LLM-based features, review the paper's defense matrix to identify which mitigations apply to your architecture and threat model.
  • Layer defenses: No single technique is sufficient. The SLR reinforces that effective protection requires combining input/output filtering, prompt hardening, and runtime monitoring.
  • Track NIST updates: The extended taxonomy provides a living framework—watch for future NIST revisions that may incorporate these additions.
  • Benchmark before shipping: Use the reported attack datasets and success rates as baselines to test your own defenses before production deployment.

Sources