arXiv — Agent-Sentry bounding LLM agents

AI relevance: Agent-Sentry targets a core agent security problem: stopping LLM-driven tool execution when runtime behavior drifts beyond the user’s intended task.

  • The paper introduces Agent-Sentry, a runtime control layer for agentic systems that learns expected execution behavior from prior traces.
  • Its core idea is to bound agent behavior instead of treating every possible tool sequence as acceptable just because the model can generate it.
  • The framework builds functionality graphs that capture recurring control flow and data provenance across benign and adversarial runs.
  • At runtime, Agent-Sentry compares proposed actions against those learned patterns and blocks out-of-bounds tool calls.
  • The design adds a trusted intent-alignment check for cases where structural traces alone are ambiguous.
  • The authors position this as a defense against indirect prompt injection and other compromises that push agents into irrelevant or dangerous actions.
  • On the paper’s benchmarks, Agent-Sentry reportedly blocks over 90% of attacks while preserving up to 98% utility.
  • The threat model maps well to email, workflow, and enterprise assistant agents where read → decide → act chains can be hijacked by untrusted content.

Why it matters

  • Most agent defenses still rely on prompts, allowlists, or coarse output filters; those are weak when the dangerous step is an otherwise valid tool invocation.
  • Execution provenance is a useful angle because agent compromise often shows up as a weird action sequence, not just toxic text.
  • The work is a practical reminder that safe agents need runtime policy enforcement, not just better model alignment.

What to do

  • Log execution traces: capture tool sequence, parameters, and data flow so you can define what “normal” looks like.
  • Gate high-risk actions: put approvals or hard checks around file writes, outbound network access, credential reads, and privilege changes.
  • Test with injected context: benchmark agents against indirect prompt injection scenarios instead of only normal task success.
  • Prefer bounded agents: if an agent has a narrow job, enforce a narrow execution envelope rather than leaving the full toolset open.

Sources