arXiv: Semantic Compliance Hijacking — Payload-less Skill Attacks on AI Agents
AI relevance: This paper introduces a new supply-chain attack class against AI agent skills that bypasses all current scanning tools by encoding malicious intent as natural-language compliance rules — no executable payload required.
What the paper shows
- Researchers introduce Semantic Compliance Hijacking (SCH): a payload-less supply-chain attack that translates malicious goals into unstructured natural-language instructions disguised as compliance rules within agent skill files.
- The agent's own generative capabilities synthesize and execute the unauthorized code at runtime — the attacker never ships executable payloads.
- SCH achieved 77.67% success rate for confidentiality breaches and 67.33% for remote code execution across the most vulnerable configurations.
- Tested against three mainstream agent frameworks and three foundation models, including OpenClaw and Claude Code-style architectures.
- Multi-Skill Automated Optimization (MS-AO) further boosted attack efficacy by iteratively refining the natural-language instructions for maximum agent compliance.
- Because malicious skills contain no recognizable AST signatures or explicit harmful intents, they maintained a 0.00% detection rate against current security scanning tools.
- The paper references the "ClawHavoc" incident (Koi Security, 2026), a real-world large-scale poisoning campaign targeting the OpenClaw skill marketplace that covertly exfiltrated cloud credentials.
- The core insight: skill description files are authoritative operational directives for agents, not passive documentation — agents parse, trust, and act on these natural-language documents.
Why it matters
- This is the first formal demonstration that agent skill marketplaces (ClawHub, Claude skills, GitHub MCP registries) have a blind spot that code-scanning tools cannot detect.
- The attack works because agent skills use natural language as code — and natural language instructions are the attack payload.
- Combined with the ClawHavoc incident, this shows the threat is already materializing in production agent ecosystems.
- The research calls for a paradigm shift from signature-based detection to semantic intent validation for agent skill security.
What to do
- Treat agent skill files (SKILL.md, AGENTS.md, tool descriptions) as executable code — review them with the same rigor as pull requests.
- Implement semantic intent validation: flag skills whose natural-language instructions request actions disproportionate to their stated purpose.
- Restrict skill installations to vetted sources and require manual review before installing third-party agent skills in production environments.
- Advocate for skill marketplace operators to implement behavior-based monitoring, not just static file scanning.
Sources: