Axis Intelligence — AI Model Vulnerability Tracker: 71% Attack Success Rate Across Six Frontier Models

AI relevance: Independent testing of 312 attack vectors against six production-deployed frontier models reveals that indirect prompt injection via agent tool inputs succeeds 84% of the time — the most fragile attack category — while system-prompt extraction still works in 31% of deployments across GPT-5, Claude Opus 4.7, Gemini 3, Llama 4, DeepSeek V4, and Mistral Large 3.

What happened

  • Axis Intelligence released the first iteration of its AI Model Vulnerability Tracker, a living database of independently reproduced LLM vulnerabilities across ChatGPT, Claude, Gemini, Llama, Mistral, and DeepSeek.
  • Between January 15 and April 25, 2026, they tested 312 distinct attack vectors against six production models. 71% of attacks succeeded against at least one model; 23% succeeded against all six.
  • The most fragile attack category was indirect prompt injection delivered through agent tool inputs, with an 84% success rate — confirming that the integration points where agents consume external data are the weakest security surface.
  • The most surprising finding: system-prompt extraction succeeded in 31% of tested deployments, despite being one of the oldest and best-documented LLM attack classes.
  • The most resilient category was direct policy-violating jailprompts, which models refused 77% of the time.
  • Notable individual findings included indirect injection via PDF annotation layers, multi-turn crescendo bypasses via pedagogical framing, system-prompt extraction via translation requests, and tool-result injection in RAG pipelines.
  • The tracker uses the Axis Vulnerability Index (AVI) for severity scoring and is updated weekly with lab-confirmed results.

Why it matters

This is one of the largest cross-model vulnerability studies published to date, and the results are sobering: the vast majority of tested attack vectors work against at least one frontier model, and the attack surface most relevant to AI agent deployments — indirect injection through tool inputs — is the most fragile. The persistence of system-prompt extraction at 31% success suggests that years of defense work have not resolved a fundamental design limitation in how models process mixed-instruction contexts.

What to do

  • Treat all tool results as adversarial input — never assume agent tool outputs (API responses, web fetches, database queries) are safe to pass directly into model context without sanitization.
  • Test your deployment against the tracker's published categories — especially indirect injection and system-prompt extraction, which have the highest success rates across models.
  • Implement output filtering on agent tool calls — intercept and validate agent actions before they execute, not just after.
  • Monitor the tracker at axis-intelligence.com/research/ai-model-vulnerability-tracker for new attack vectors as they are reproduced.

Sources