arXiv — Agent-Sentry bounding LLM agents
AI relevance: Agent-Sentry targets a core agent security problem: stopping LLM-driven tool execution when runtime behavior drifts beyond the user’s intended task.
- The paper introduces Agent-Sentry, a runtime control layer for agentic systems that learns expected execution behavior from prior traces.
- Its core idea is to bound agent behavior instead of treating every possible tool sequence as acceptable just because the model can generate it.
- The framework builds functionality graphs that capture recurring control flow and data provenance across benign and adversarial runs.
- At runtime, Agent-Sentry compares proposed actions against those learned patterns and blocks out-of-bounds tool calls.
- The design adds a trusted intent-alignment check for cases where structural traces alone are ambiguous.
- The authors position this as a defense against indirect prompt injection and other compromises that push agents into irrelevant or dangerous actions.
- On the paper’s benchmarks, Agent-Sentry reportedly blocks over 90% of attacks while preserving up to 98% utility.
- The threat model maps well to email, workflow, and enterprise assistant agents where read → decide → act chains can be hijacked by untrusted content.
Why it matters
- Most agent defenses still rely on prompts, allowlists, or coarse output filters; those are weak when the dangerous step is an otherwise valid tool invocation.
- Execution provenance is a useful angle because agent compromise often shows up as a weird action sequence, not just toxic text.
- The work is a practical reminder that safe agents need runtime policy enforcement, not just better model alignment.
What to do
- Log execution traces: capture tool sequence, parameters, and data flow so you can define what “normal” looks like.
- Gate high-risk actions: put approvals or hard checks around file writes, outbound network access, credential reads, and privilege changes.
- Test with injected context: benchmark agents against indirect prompt injection scenarios instead of only normal task success.
- Prefer bounded agents: if an agent has a narrow job, enforce a narrow execution envelope rather than leaving the full toolset open.