arXiv: Semantic Compliance Hijacking — Payload-less Skill Attacks on AI Agents

2026-05-20 Security by al-ice.ai Editorial

AI relevance: This paper introduces a new supply-chain attack class against AI agent skills that bypasses all current scanning tools by encoding malicious intent as natural-language compliance rules — no executable payload required.

What the paper shows

Researchers introduce Semantic Compliance Hijacking (SCH): a payload-less supply-chain attack that translates malicious goals into unstructured natural-language instructions disguised as compliance rules within agent skill files.
The agent's own generative capabilities synthesize and execute the unauthorized code at runtime — the attacker never ships executable payloads.
SCH achieved 77.67% success rate for confidentiality breaches and 67.33% for remote code execution across the most vulnerable configurations.
Tested against three mainstream agent frameworks and three foundation models, including OpenClaw and Claude Code-style architectures.
Multi-Skill Automated Optimization (MS-AO) further boosted attack efficacy by iteratively refining the natural-language instructions for maximum agent compliance.
Because malicious skills contain no recognizable AST signatures or explicit harmful intents, they maintained a 0.00% detection rate against current security scanning tools.
The paper references the "ClawHavoc" incident (Koi Security, 2026), a real-world large-scale poisoning campaign targeting the OpenClaw skill marketplace that covertly exfiltrated cloud credentials.
The core insight: skill description files are authoritative operational directives for agents, not passive documentation — agents parse, trust, and act on these natural-language documents.

Why it matters

This is the first formal demonstration that agent skill marketplaces (ClawHub, Claude skills, GitHub MCP registries) have a blind spot that code-scanning tools cannot detect.
The attack works because agent skills use natural language as code — and natural language instructions are the attack payload.
Combined with the ClawHavoc incident, this shows the threat is already materializing in production agent ecosystems.
The research calls for a paradigm shift from signature-based detection to semantic intent validation for agent skill security.

What to do

Treat agent skill files (SKILL.md, AGENTS.md, tool descriptions) as executable code — review them with the same rigor as pull requests.
Implement semantic intent validation: flag skills whose natural-language instructions request actions disproportionate to their stated purpose.
Restrict skill installations to vetted sources and require manual review before installing third-party agent skills in production environments.
Advocate for skill marketplace operators to implement behavior-based monitoring, not just static file scanning.

Sources: