arXiv — Securing LLM Agents Needs Intent-to-Execution Integrity (2605.16976)
AI relevance: This position paper identifies a foundational gap in LLM agent security — the lack of a formal correctness property defining what "secure execution" means — and maps existing defenses against four integrity requirements that any agentic system must satisfy.
- Researchers from NUS, UCLA, and Berkeley (Qu et al., May 2026) argue that LLM agent security requires a formal intent-to-execution integrity property — analogous to compiler correctness — that specifies when an agent's actions faithfully reflect the user's intent.
- They identify four conjunctive integrity properties: Tool Integrity (tools do what they claim), Instruction Integrity (user instructions aren't hijacked), Judgment Integrity (the model's decisions aren't poisoned), and Data Flow Integrity (data moves only along intended paths).
- The paper highlights systems like OpenClaw with open skill ecosystems and third-party tool integrations, where tools cannot be assumed trusted — breaking the implicit assumption underlying most existing agent-security defenses.
- Analysis of existing defenses (NemoClaw, SeClaw, SafeClaw-R, SecureClaw) reveals only partial, non-compositional coverage — each stacks mechanisms without defining what correctness property those mechanisms are intended to achieve.
- The authors draw a compiler analogy: just as a compiler must preserve program semantics from source to binary, an LLM agent must preserve user intent from natural language to tool-call execution. Security violations are "mis-executions" in this pipeline.
- The paper catalogs real-world evidence of supply-chain risks in agent tool ecosystems: the ClawHavoc campaign (1,000+ malicious ClawHub skills), a 42,447-skill audit finding 26.1% with vulnerabilities, and documented sandbox bypasses.
Why it matters
The agentic AI security field is drowning in point solutions (sandboxing, injection filters, policy enforcement) without a unified theory of what security means for agents. This paper provides the missing framework to evaluate whether defense compositions actually close all attack paths — or leave exploitable gaps between mechanisms.
What to do
- Read the paper and map your agent architecture against the four integrity properties — identify which are unaddressed.
- Treat third-party tools as untrusted by default, especially in open-skill ecosystems.
- Audit tool compositions for gaps where multiple partial defenses leave a combined attack path open.