arXiv: Neutral Prompting Attacks — Stealthy Hallucination Steering in Agent Skills
AI relevance: LLM coding agents that generate dependency installation commands are now a software supply chain attack vector — this paper demonstrates that semantically benign prompts can covertly steer hallucinated package names, letting attackers register those names and compromise downstream installs.
- Paper: "Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills" (arXiv:2605.29354, submitted May 28, 2026, under review)
- Introduces Neutral Prompting Attack (NPA) — prompts that contain no explicit malicious intent but increase the model's propensity to hallucinate non-existent package names
- Unlike targeted dependency steering, NPA doesn't specify an attacker-chosen package; it shifts the model's entire dependency generation distribution toward speculative names
- Evaluated across multiple coding-oriented LLMs and package hallucination benchmarks
- NPA increases both Hallucination ASR and Pip Install ASR, changes the distribution of hallucinated names, and evades existing static-analysis, LLM-based, and agent-based Skill defenses
- Attack vector: attacker registers hallucinated package names on PyPI/npm, then waits for agents to install them under NPA-influenced prompts
- Key insight: defenses focused on detecting obviously malicious prompts miss this attack because the prompts themselves are semantically benign (e.g., "be imaginative" or "exhaustively consider options")
Why it matters
This represents a third class of agent supply-chain attack beyond direct prompt injection and malicious skill injection. NPA exploits the agent's creativity/exhaustiveness directives — features that are encouraged by design. Organizations using coding agents for dependency management now face a scenario where the most natural prompting patterns ("think broadly", "explore all options") become attack enablers. The attack bypasses all three known defense categories tested.
What to do
- Implement package name validation in agent workflows before any
pip installornpm installexecutes - Cross-reference generated package names against known registries before installation
- Restrict agent tool permissions to prevent unvalidated package installs
- Review agent system prompts for creativity/exhaustiveness directives that could increase hallucination surface
Sources: