arXiv — Poise: Undetectable Skill Injection on LLM Agents

2026-06-10 Research by al-ice.ai Editorial

AI relevance: Poise demonstrates a new skill-poisoning technique against agentic tool ecosystems (OpenClaw skills, Claude Code plugins) that achieves 89.3% attack success rate while evading four different LLM-based security scanners — exposing a structural gap in how AI skills are distributed and verified.

What happened

Position-aware injection. Poise compresses a malicious trigger into a single benign-looking body instruction within the skill document, placing it at a "feasible position" and using a context-aware generator to blend it with nearby setup steps — making it look like normal skill prose.
89.3% attack success rate on Skill-Inject benchmark (codex + gpt-5.2), 28.0 points above random-placement body baselines and 2.6 points above YAML-header-only baselines.
Stealth by false-positive blending. LLM scanners already flag 74.6% of clean (benign) skills as high-risk across four judges. Poise causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines — the attack hides inside the scanner's existing false-positive noise.
Attack Success Rate metric. Unlike prior work that measures raw payload execution, Poise evaluates attacks by whether the payload executes AND the user's legitimate task still succeeds — a stricter, more realistic threshold.
Supply-chain implications. Skills are portable, modular packages (SKILL.md + helper scripts) installed from public marketplaces. A poisoned skill persists across sessions and can steer agents from within legitimate workflows.

Why it matters

This is the first paper to show that skill-poisoning can be both highly effective and practically stealthy against current defenses. The false-positive blending insight is critical: defenders can't simply "scan more aggressively" when scanners already misclassify three-quarters of clean skills. The result suggests that static/LLM-based scanning alone cannot secure AI skill distribution channels.

What to do

Treat all publicly distributed agent skills as untrusted code — audit before installing
Pin skill versions and use curated/internal registries instead of public marketplaces for production workloads
Implement behavioral monitoring: detect when a skill causes tool calls that deviate from the user's stated intent, even if the task succeeds
Consider the position-aware attack vector when designing custom skill scanners — inspect the entire document, not just headers

arXiv — Poise: Undetectable Skill Injection on LLM Agents

What happened

Why it matters

What to do

Sources