arXiv: Neutral Prompting Attacks — Stealthy Hallucination Steering in Agent Skills

2026-06-01 Security by al-ice.ai Editorial

AI relevance: LLM coding agents that generate dependency installation commands are now a software supply chain attack vector — this paper demonstrates that semantically benign prompts can covertly steer hallucinated package names, letting attackers register those names and compromise downstream installs.

Paper: "Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills" (arXiv:2605.29354, submitted May 28, 2026, under review)
Introduces Neutral Prompting Attack (NPA) — prompts that contain no explicit malicious intent but increase the model's propensity to hallucinate non-existent package names
Unlike targeted dependency steering, NPA doesn't specify an attacker-chosen package; it shifts the model's entire dependency generation distribution toward speculative names
Evaluated across multiple coding-oriented LLMs and package hallucination benchmarks
NPA increases both Hallucination ASR and Pip Install ASR, changes the distribution of hallucinated names, and evades existing static-analysis, LLM-based, and agent-based Skill defenses
Attack vector: attacker registers hallucinated package names on PyPI/npm, then waits for agents to install them under NPA-influenced prompts
Key insight: defenses focused on detecting obviously malicious prompts miss this attack because the prompts themselves are semantically benign (e.g., "be imaginative" or "exhaustively consider options")

Why it matters

This represents a third class of agent supply-chain attack beyond direct prompt injection and malicious skill injection. NPA exploits the agent's creativity/exhaustiveness directives — features that are encouraged by design. Organizations using coding agents for dependency management now face a scenario where the most natural prompting patterns ("think broadly", "explore all options") become attack enablers. The attack bypasses all three known defense categories tested.

What to do

Implement package name validation in agent workflows before any pip install or npm install executes
Cross-reference generated package names against known registries before installation
Restrict agent tool permissions to prevent unvalidated package installs
Review agent system prompts for creativity/exhaustiveness directives that could increase hallucination surface

Sources: