arXiv — Agent-Sentry bounding LLM agents

2026-03-27 Research by al-ice.ai Editorial

AI relevance: Agent-Sentry targets a core agent security problem: stopping LLM-driven tool execution when runtime behavior drifts beyond the user’s intended task.

The paper introduces Agent-Sentry, a runtime control layer for agentic systems that learns expected execution behavior from prior traces.
Its core idea is to bound agent behavior instead of treating every possible tool sequence as acceptable just because the model can generate it.
The framework builds functionality graphs that capture recurring control flow and data provenance across benign and adversarial runs.
At runtime, Agent-Sentry compares proposed actions against those learned patterns and blocks out-of-bounds tool calls.
The design adds a trusted intent-alignment check for cases where structural traces alone are ambiguous.
The authors position this as a defense against indirect prompt injection and other compromises that push agents into irrelevant or dangerous actions.
On the paper’s benchmarks, Agent-Sentry reportedly blocks over 90% of attacks while preserving up to 98% utility.
The threat model maps well to email, workflow, and enterprise assistant agents where read → decide → act chains can be hijacked by untrusted content.

Why it matters

Most agent defenses still rely on prompts, allowlists, or coarse output filters; those are weak when the dangerous step is an otherwise valid tool invocation.
Execution provenance is a useful angle because agent compromise often shows up as a weird action sequence, not just toxic text.
The work is a practical reminder that safe agents need runtime policy enforcement, not just better model alignment.

What to do

Log execution traces: capture tool sequence, parameters, and data flow so you can define what “normal” looks like.
Gate high-risk actions: put approvals or hard checks around file writes, outbound network access, credential reads, and privilege changes.
Test with injected context: benchmark agents against indirect prompt injection scenarios instead of only normal task success.
Prefer bounded agents: if an agent has a narrow job, enforce a narrow execution envelope rather than leaving the full toolset open.

arXiv — Agent-Sentry bounding LLM agents

Why it matters

What to do

Sources