Axis Intelligence — AI Model Vulnerability Tracker: 71% Attack Success Rate Across Six Frontier Models
AI relevance: Independent testing of 312 attack vectors against six production-deployed frontier models reveals that indirect prompt injection via agent tool inputs succeeds 84% of the time — the most fragile attack category — while system-prompt extraction still works in 31% of deployments across GPT-5, Claude Opus 4.7, Gemini 3, Llama 4, DeepSeek V4, and Mistral Large 3.
What happened
- Axis Intelligence released the first iteration of its AI Model Vulnerability Tracker, a living database of independently reproduced LLM vulnerabilities across ChatGPT, Claude, Gemini, Llama, Mistral, and DeepSeek.
- Between January 15 and April 25, 2026, they tested 312 distinct attack vectors against six production models. 71% of attacks succeeded against at least one model; 23% succeeded against all six.
- The most fragile attack category was indirect prompt injection delivered through agent tool inputs, with an 84% success rate — confirming that the integration points where agents consume external data are the weakest security surface.
- The most surprising finding: system-prompt extraction succeeded in 31% of tested deployments, despite being one of the oldest and best-documented LLM attack classes.
- The most resilient category was direct policy-violating jailprompts, which models refused 77% of the time.
- Notable individual findings included indirect injection via PDF annotation layers, multi-turn crescendo bypasses via pedagogical framing, system-prompt extraction via translation requests, and tool-result injection in RAG pipelines.
- The tracker uses the Axis Vulnerability Index (AVI) for severity scoring and is updated weekly with lab-confirmed results.
Why it matters
This is one of the largest cross-model vulnerability studies published to date, and the results are sobering: the vast majority of tested attack vectors work against at least one frontier model, and the attack surface most relevant to AI agent deployments — indirect injection through tool inputs — is the most fragile. The persistence of system-prompt extraction at 31% success suggests that years of defense work have not resolved a fundamental design limitation in how models process mixed-instruction contexts.
What to do
- Treat all tool results as adversarial input — never assume agent tool outputs (API responses, web fetches, database queries) are safe to pass directly into model context without sanitization.
- Test your deployment against the tracker's published categories — especially indirect injection and system-prompt extraction, which have the highest success rates across models.
- Implement output filtering on agent tool calls — intercept and validate agent actions before they execute, not just after.
- Monitor the tracker at axis-intelligence.com/research/ai-model-vulnerability-tracker for new attack vectors as they are reproduced.