arXiv — SkillCloak Bypasses 90%+ of Agent Skill Scanners With Payload-Preserving Evasion

AI relevance: Agent skill marketplaces now host 40,000+ community-contributed skills, and this research shows that static scanners fail to detect malicious skills when attackers apply semantics-preserving transformations, leaving agent deployments exposed to credential theft and code exfiltration.

  • Researchers introduced SkillCloak, a payload-preserving evasion framework that transforms malicious agent skills while maintaining their attack functionality
  • Two evasion strategies: Structural Obfuscation rewrites visible payload indicators (shell commands, URLs, credential paths) into semantically equivalent forms; Self-Extracting Skill (SFS) Packing hides malicious components in ignored directories or encoded blobs, restoring them during agent execution
  • Tested against 8 representative skill scanners and 1,613 in-the-wild malicious skills from ClawHub archives
  • SFS Packing bypassed all 8 scanners at over 90% effectiveness; Structural Obfuscation bypassed over 80% on most static scanners and reached 96% on a hybrid scanner
  • Cloaked skills remained fully functional across both Codex and Claude Code agents with no statistically detectable loss in utility
  • Paper proposes SkillDetonate, a behavior-centric runtime auditor that executes skills in a sandbox and detects malicious effects through OS-boundary information-flow evidence rather than install-time appearance
  • SkillDetonate achieved 97% detection rate at 2% false-positive rate, sustaining 87% detection on real-world evasive malicious skills
  • Attack surface is broad: malicious payloads can be embedded in SKILL.md natural-language instructions, bundled scripts, or auxiliary resources that agents interpret and execute with full workspace privileges
  • Real-world impact: ClawHavoc campaign planted 300+ malicious skills on a single marketplace, harvesting browser credentials, keychain passwords, SSH keys, and cryptocurrency wallets

Why it matters

Agent skill ecosystems are software supply chains. Once installed, a skill executes with the agent's full privileges — access to local files, credentials, package managers, terminals, and external services. Static scanners based on pattern matching or LLM-as-judge analysis cannot withstand adaptive adversaries who rewrite payloads while preserving semantics. The gap between install-time auditing and runtime behavior is the attack surface.

What to do

  • Do not rely solely on static skill scanners or marketplace vetting for agent skill security
  • Implement runtime auditing that observes skill execution in sandboxed environments with taint tracking across files, processes, and network operations
  • Pin skill versions and audit SKILL.md instructions before installation; treat community-contributed skills as untrusted by default
  • Monitor for OS-boundary information flows: credential file access, outbound network requests, shell command execution from skill contexts
  • Review the SkillDetonate methodology for behavior-centric detection patterns applicable to your agent deployment

Sources: