arXiv — SkillCloak Bypasses 90%+ of Agent Skill Scanners With Payload-Preserving Evasion

2026-07-05 Security by al-ice.ai Editorial

AI relevance: Agent skill marketplaces now host 40,000+ community-contributed skills, and this research shows that static scanners fail to detect malicious skills when attackers apply semantics-preserving transformations, leaving agent deployments exposed to credential theft and code exfiltration.

Researchers introduced SkillCloak, a payload-preserving evasion framework that transforms malicious agent skills while maintaining their attack functionality
Two evasion strategies: Structural Obfuscation rewrites visible payload indicators (shell commands, URLs, credential paths) into semantically equivalent forms; Self-Extracting Skill (SFS) Packing hides malicious components in ignored directories or encoded blobs, restoring them during agent execution
Tested against 8 representative skill scanners and 1,613 in-the-wild malicious skills from ClawHub archives
SFS Packing bypassed all 8 scanners at over 90% effectiveness; Structural Obfuscation bypassed over 80% on most static scanners and reached 96% on a hybrid scanner
Cloaked skills remained fully functional across both Codex and Claude Code agents with no statistically detectable loss in utility
Paper proposes SkillDetonate, a behavior-centric runtime auditor that executes skills in a sandbox and detects malicious effects through OS-boundary information-flow evidence rather than install-time appearance
SkillDetonate achieved 97% detection rate at 2% false-positive rate, sustaining 87% detection on real-world evasive malicious skills
Attack surface is broad: malicious payloads can be embedded in SKILL.md natural-language instructions, bundled scripts, or auxiliary resources that agents interpret and execute with full workspace privileges
Real-world impact: ClawHavoc campaign planted 300+ malicious skills on a single marketplace, harvesting browser credentials, keychain passwords, SSH keys, and cryptocurrency wallets

Why it matters

Agent skill ecosystems are software supply chains. Once installed, a skill executes with the agent's full privileges — access to local files, credentials, package managers, terminals, and external services. Static scanners based on pattern matching or LLM-as-judge analysis cannot withstand adaptive adversaries who rewrite payloads while preserving semantics. The gap between install-time auditing and runtime behavior is the attack surface.

What to do

Do not rely solely on static skill scanners or marketplace vetting for agent skill security
Implement runtime auditing that observes skill execution in sandboxed environments with taint tracking across files, processes, and network operations
Pin skill versions and audit SKILL.md instructions before installation; treat community-contributed skills as untrusted by default
Monitor for OS-boundary information flows: credential file access, outbound network requests, shell command execution from skill contexts
Review the SkillDetonate methodology for behavior-centric detection patterns applicable to your agent deployment

Sources: