arXiv — Lifecycle & Application-Stack Survey of LLM Vulnerabilities (WPI SoK)

2026-07-02 Research by al-ice.ai Editorial

AI relevance: As LLMs move from text generators to agent backends that call tools, write files, and act across organizational boundaries, this SoK provides the first unified taxonomy mapping where trust boundaries fail across the full deployment lifecycle.

What it is

Seyed Bagher Hashemi Natanzi and Bo Tang from Worcester Polytechnic Institute (WPI) published a Systematization of Knowledge (SoK) paper: "A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities" (arXiv:2606.31639).
Unlike prior surveys that list isolated attack names, this paper organizes vulnerabilities across eight lifecycle stages: data collection, pretraining, post-training alignment, model packaging & supply chain, retrieval & memory, prompting & inference, tool/agent execution, and deployment/maintenance.
The core thesis: LLM risks arise not from model weights alone, but from the full application stack — prompts, retrieval systems, user profiles, persistent memories, external tools, plugins, workflow orchestrators, and autonomous agents.
A model that only emits text may cause misinformation; an application that lets the same model access files, email, calendars, databases, cloud APIs, browsers, terminals, or robots converts text-generation failures into external side effects.
The paper maps attacks to security objectives beyond the traditional CIA triad: safety, privacy, fairness, accountability, and agency control are treated as first-class objectives.
Key insight: the same string is harmless in a sandboxed chatbot but dangerous in an agent with database write privileges. The same poisoning attempt has different implications in pretraining data vs. instruction-tuning sets vs. RAG corpora vs. tool descriptions vs. persistent memory.
Tool attacks covered include: malicious tool outputs, prompt injection through API responses, unsafe command construction, tool-description poisoning, excessive permissions, and cross-tool data exfiltration.
The paper identifies compositional security as the central open problem: point defenses rarely compose across the stack, and trust boundaries fail at the seams between lifecycle stages.

Why it matters

This is the most comprehensive attempt to unify the fragmented LLM security literature into a systems view. For teams deploying AI agents, it provides a checklist of where to look for vulnerabilities — not just in the model, but in every layer from training data provenance to tool-call containment to incident response for adaptive systems. The eight-stage lifecycle model is directly actionable for red teaming and security architecture reviews.

What to do

Use the eight-stage lifecycle framework to audit your AI deployment: where are your trust boundaries, and what happens when untrusted data becomes executable instruction?
Pay special attention to the tool/agent execution stage — this is where text-generation failures become real-world side effects with financial, legal, and safety consequences.
Read the full paper for the defense-in-depth architecture: deterministic controls, model-level robustness, monitoring/red teaming, privacy controls, and governance processes.

arXiv — Lifecycle & Application-Stack Survey of LLM Vulnerabilities (WPI SoK)

What it is

Why it matters

What to do

Sources