CSO Online — Pen Tests: AI Security Flaws 2.5× More Severe Than Legacy Bugs

2026-05-14 Research by al-ice.ai Editorial

AI relevance: Real-world pentest data confirms AI/LLM systems are producing a disproportionate share of critical vulnerabilities — and teams lack established playbooks to fix them.

Cobalt's 2026 State of Pentesting Report finds 32% of all AI/LLM findings rated high-risk, compared to 13% for traditional enterprise apps — a 2.5× ratio.
LLM vulnerabilities have the lowest remediation rate of any application type: only 38% of high-risk AI findings are fixed, reflecting fragmented ownership across engineering, security, legal, and business teams.
One in five organizations reported an LLM security incident in the past year; another 18% were unsure and 19% declined to answer.
Prompt injection, ranked #1 by OWASP for LLM applications, has surged 540% year-over-year in HackerOne bug bounty reports.
Experts cite three drivers: immature security controls for AI systems, larger blast radius when agents connect to internal knowledge bases and tools, and no established remediation playbook for AI-specific flaws.
Adrian Furtuna (Pentest-Tools.com): developers know how to fix SQL injection or XXE, but "when they see a prompt injection chain or an insecure tool call boundary, they often don't [have a playbook]."

Why it matters

The data moves AI security risk from theoretical to measured. When nearly a third of AI findings are high-severity and fewer than two in five get fixed, organizations deploying agents with tool access are accumulating unremediated risk at scale. The remediation gap — not just the discovery rate — is the critical metric.

What to do

Treat AI system findings with the same SLA discipline as traditional high-severity bugs — assign clear ownership, not cross-team ambiguity.
Develop internal playbooks for AI-specific vulnerability classes: prompt injection chains, insecure tool call boundaries, and over-permitted agent integrations.
Scope pentests to include agent tool-access paths, not just the model endpoint.