arXiv — BadSkill: Agent Supply Chain Backdoor Attacks via Model-in-Skill Poisoning

AI relevance: This research identifies a critical supply chain vulnerability in AI agent ecosystems where third-party skills can bundle poisoned models that activate malicious behavior only under specific conditions, bypassing traditional prompt injection detection and affecting platforms like OpenClaw.

  • BadSkill Attack — New supply chain threat targeting agent skills with embedded models
  • Mechanism — Skills bundle backdoor-fine-tuned models that activate only on semantic trigger combinations
  • Stealth — Appears benign during normal operation, activates only on attacker-chosen parameters
  • Success Rate — Up to 99.5% attack success rate across 8 triggered skills
  • Poison Efficiency — 3% poison rate achieves 91.7% attack success
  • Scale Resilience — Effective across model sizes from 494M to 7.1B parameters
  • Robustness — Maintains effectiveness under text perturbation attacks
  • Evaluation — Tested on 13 skills with 571 negative-class and 396 trigger-aligned queries
  • Platform — OpenClaw-inspired simulation environment used for evaluation
  • Impact — Affects any agent ecosystem using third-party skills with embedded models

Why it matters

AI agent ecosystems increasingly rely on installable third-party skills to extend functionality, creating a new supply chain attack surface. Unlike prompt injection attacks that manipulate runtime behavior, BadSkill embeds malicious behavior directly into skill-bundled models during development. This allows attackers to publish seemingly benign skills that only activate their payload when specific semantic conditions are met, making detection extremely challenging through conventional security reviews.

What to do

AI platform developers and security teams should implement stronger provenance verification and behavioral vetting for third-party skill artifacts. The research demonstrates that even low poison rates can achieve high attack success, necessitating:

  • Enhanced skill artifact provenance and code signing verification
  • Runtime behavioral monitoring for skill execution patterns
  • Model integrity checking for skills bundling machine learning artifacts
  • Supply chain security reviews for third-party agent skills
  • Isolation mechanisms for skills with embedded models
  • Regular security audits of popular third-party skills

References