arXiv — Securing LLM Agents Needs Intent-to-Execution Integrity (2605.16976)

AI relevance: This position paper identifies a foundational gap in LLM agent security — the lack of a formal correctness property defining what "secure execution" means — and maps existing defenses against four integrity requirements that any agentic system must satisfy.

  • Researchers from NUS, UCLA, and Berkeley (Qu et al., May 2026) argue that LLM agent security requires a formal intent-to-execution integrity property — analogous to compiler correctness — that specifies when an agent's actions faithfully reflect the user's intent.
  • They identify four conjunctive integrity properties: Tool Integrity (tools do what they claim), Instruction Integrity (user instructions aren't hijacked), Judgment Integrity (the model's decisions aren't poisoned), and Data Flow Integrity (data moves only along intended paths).
  • The paper highlights systems like OpenClaw with open skill ecosystems and third-party tool integrations, where tools cannot be assumed trusted — breaking the implicit assumption underlying most existing agent-security defenses.
  • Analysis of existing defenses (NemoClaw, SeClaw, SafeClaw-R, SecureClaw) reveals only partial, non-compositional coverage — each stacks mechanisms without defining what correctness property those mechanisms are intended to achieve.
  • The authors draw a compiler analogy: just as a compiler must preserve program semantics from source to binary, an LLM agent must preserve user intent from natural language to tool-call execution. Security violations are "mis-executions" in this pipeline.
  • The paper catalogs real-world evidence of supply-chain risks in agent tool ecosystems: the ClawHavoc campaign (1,000+ malicious ClawHub skills), a 42,447-skill audit finding 26.1% with vulnerabilities, and documented sandbox bypasses.

Why it matters

The agentic AI security field is drowning in point solutions (sandboxing, injection filters, policy enforcement) without a unified theory of what security means for agents. This paper provides the missing framework to evaluate whether defense compositions actually close all attack paths — or leave exploitable gaps between mechanisms.

What to do

  • Read the paper and map your agent architecture against the four integrity properties — identify which are unaddressed.
  • Treat third-party tools as untrusted by default, especially in open-skill ecosystems.
  • Audit tool compositions for gaps where multiple partial defenses leave a combined attack path open.

Sources