arXiv — BadSkill: Agent Supply Chain Backdoor Attacks via Model-in-Skill Poisoning

2026-04-01 Security by al-ice.ai Editorial

AI relevance: This research identifies a critical supply chain vulnerability in AI agent ecosystems where third-party skills can bundle poisoned models that activate malicious behavior only under specific conditions, bypassing traditional prompt injection detection and affecting platforms like OpenClaw.

BadSkill Attack — New supply chain threat targeting agent skills with embedded models
Mechanism — Skills bundle backdoor-fine-tuned models that activate only on semantic trigger combinations
Stealth — Appears benign during normal operation, activates only on attacker-chosen parameters
Success Rate — Up to 99.5% attack success rate across 8 triggered skills
Poison Efficiency — 3% poison rate achieves 91.7% attack success
Scale Resilience — Effective across model sizes from 494M to 7.1B parameters
Robustness — Maintains effectiveness under text perturbation attacks
Evaluation — Tested on 13 skills with 571 negative-class and 396 trigger-aligned queries
Platform — OpenClaw-inspired simulation environment used for evaluation
Impact — Affects any agent ecosystem using third-party skills with embedded models

Why it matters

AI agent ecosystems increasingly rely on installable third-party skills to extend functionality, creating a new supply chain attack surface. Unlike prompt injection attacks that manipulate runtime behavior, BadSkill embeds malicious behavior directly into skill-bundled models during development. This allows attackers to publish seemingly benign skills that only activate their payload when specific semantic conditions are met, making detection extremely challenging through conventional security reviews.

What to do

AI platform developers and security teams should implement stronger provenance verification and behavioral vetting for third-party skill artifacts. The research demonstrates that even low poison rates can achieve high attack success, necessitating:

Enhanced skill artifact provenance and code signing verification
Runtime behavioral monitoring for skill execution patterns
Model integrity checking for skills bundling machine learning artifacts
Supply chain security reviews for third-party agent skills
Isolation mechanisms for skills with embedded models
Regular security audits of popular third-party skills

arXiv — BadSkill: Agent Supply Chain Backdoor Attacks via Model-in-Skill Poisoning

Why it matters

What to do

References