arXiv — SkillCloak Bypasses 90%+ of Agent Skill Scanners With Payload-Preserving Evasion
AI relevance: Agent skill marketplaces now host 40,000+ community-contributed skills, and this research shows that static scanners fail to detect malicious skills when attackers apply semantics-preserving transformations, leaving agent deployments exposed to credential theft and code exfiltration.
- Researchers introduced SkillCloak, a payload-preserving evasion framework that transforms malicious agent skills while maintaining their attack functionality
- Two evasion strategies: Structural Obfuscation rewrites visible payload indicators (shell commands, URLs, credential paths) into semantically equivalent forms; Self-Extracting Skill (SFS) Packing hides malicious components in ignored directories or encoded blobs, restoring them during agent execution
- Tested against 8 representative skill scanners and 1,613 in-the-wild malicious skills from ClawHub archives
- SFS Packing bypassed all 8 scanners at over 90% effectiveness; Structural Obfuscation bypassed over 80% on most static scanners and reached 96% on a hybrid scanner
- Cloaked skills remained fully functional across both Codex and Claude Code agents with no statistically detectable loss in utility
- Paper proposes SkillDetonate, a behavior-centric runtime auditor that executes skills in a sandbox and detects malicious effects through OS-boundary information-flow evidence rather than install-time appearance
- SkillDetonate achieved 97% detection rate at 2% false-positive rate, sustaining 87% detection on real-world evasive malicious skills
- Attack surface is broad: malicious payloads can be embedded in SKILL.md natural-language instructions, bundled scripts, or auxiliary resources that agents interpret and execute with full workspace privileges
- Real-world impact: ClawHavoc campaign planted 300+ malicious skills on a single marketplace, harvesting browser credentials, keychain passwords, SSH keys, and cryptocurrency wallets
Why it matters
Agent skill ecosystems are software supply chains. Once installed, a skill executes with the agent's full privileges — access to local files, credentials, package managers, terminals, and external services. Static scanners based on pattern matching or LLM-as-judge analysis cannot withstand adaptive adversaries who rewrite payloads while preserving semantics. The gap between install-time auditing and runtime behavior is the attack surface.
What to do
- Do not rely solely on static skill scanners or marketplace vetting for agent skill security
- Implement runtime auditing that observes skill execution in sandboxed environments with taint tracking across files, processes, and network operations
- Pin skill versions and audit SKILL.md instructions before installation; treat community-contributed skills as untrusted by default
- Monitor for OS-boundary information flows: credential file access, outbound network requests, shell command execution from skill contexts
- Review the SkillDetonate methodology for behavior-centric detection patterns applicable to your agent deployment
Sources: