arXiv: Neutral Prompting Attacks — Stealthy Hallucination Steering in Agent Skills

AI relevance: LLM coding agents that generate dependency installation commands are now a software supply chain attack vector — this paper demonstrates that semantically benign prompts can covertly steer hallucinated package names, letting attackers register those names and compromise downstream installs.

  • Paper: "Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent Skills" (arXiv:2605.29354, submitted May 28, 2026, under review)
  • Introduces Neutral Prompting Attack (NPA) — prompts that contain no explicit malicious intent but increase the model's propensity to hallucinate non-existent package names
  • Unlike targeted dependency steering, NPA doesn't specify an attacker-chosen package; it shifts the model's entire dependency generation distribution toward speculative names
  • Evaluated across multiple coding-oriented LLMs and package hallucination benchmarks
  • NPA increases both Hallucination ASR and Pip Install ASR, changes the distribution of hallucinated names, and evades existing static-analysis, LLM-based, and agent-based Skill defenses
  • Attack vector: attacker registers hallucinated package names on PyPI/npm, then waits for agents to install them under NPA-influenced prompts
  • Key insight: defenses focused on detecting obviously malicious prompts miss this attack because the prompts themselves are semantically benign (e.g., "be imaginative" or "exhaustively consider options")

Why it matters

This represents a third class of agent supply-chain attack beyond direct prompt injection and malicious skill injection. NPA exploits the agent's creativity/exhaustiveness directives — features that are encouraged by design. Organizations using coding agents for dependency management now face a scenario where the most natural prompting patterns ("think broadly", "explore all options") become attack enablers. The attack bypasses all three known defense categories tested.

What to do

  • Implement package name validation in agent workflows before any pip install or npm install executes
  • Cross-reference generated package names against known registries before installation
  • Restrict agent tool permissions to prevent unvalidated package installs
  • Review agent system prompts for creativity/exhaustiveness directives that could increase hallucination surface

Sources: