arXiv — POISE: Position-Aware Undetectable Skill Injection on LLM Agents
AI relevance: Agent skills (like SKILL.md files) are a core extension mechanism in AI agent ecosystems, and this paper demonstrates a practical injection attack that achieves 89.3% success while evading LLM-based static scanners — directly threatening the skill supply chain that OpenClaw, Claude Code, and other harnesses rely on.
- The reliability–stealth tradeoff. Prior skill-poisoning attacks forced a choice: YAML-header injections load reliably but are easy for scanners to inspect; body-text injections are stealthier but less reliable because out-of-context malicious commands raise suspicion with the agent itself.
- POISE compresses the trigger. The attack places a single, benign-looking instruction at a carefully chosen position within the skill body, using a context-aware generator to blend it with surrounding setup or prerequisite steps — making it read as a natural continuation rather than an anomalous command.
- 89.3% attack success rate. Tested on Skill-Inject with codex+gpt-5.2, POISE outperforms random-placement body baselines by 28.0 percentage points and beats YAML-only baselines by 2.6 points, while retaining the stealth advantage of body placement.
- LLM scanners are broken for this threat model. The paper finds that four different LLM judge models falsely flag 74.6% of clean skills on average — because legitimate skill bodies naturally request privileged tool operations. POISE adds only 5.6% incremental high-risk alerts over clean baselines, meaning the attack hides in the noise of existing false positives.
- Three harm categories evaluated. The authors instantiate attacks across exfil (credential/data exfiltration setup), cfg (configuration tampering), and sys (privileged-shell behavior), showing the technique generalizes beyond a single payload type.
- Implications for skill registries. ClawHub, MCP server registries, and any marketplace distributing agent skills face a fundamental detection gap: static LLM-based scanning cannot reliably distinguish poisoned skills from legitimate ones when both require privileged operations.
Why it matters
The agent skills ecosystem assumes skills are authored by trusted developers. POISE shows that a single compromised SKILL.md in a supply chain — from a breached repo, a typosquatted package, or a malicious contributor — can persist through current scanning defenses. As agent frameworks increasingly load third-party skills automatically, this attack surface grows without corresponding detection improvements.
What to do
- Verify skill provenance before installation — check author identity, repo history, and community trust signals.
- Run skills in constrained sandboxes with least-privilege tool permissions, not as unrestricted agent capabilities.
- Monitor skill execution traces for anomalous tool-call patterns (e.g., file reads outside project scope, network exfil attempts).
- Treat LLM-based static scanning as a noisy baseline, not a security gate — it produces too many false positives on legitimate skills to be actionable.