arXiv — Contextualized privacy defense for LLM agents

AI relevance: The work targets privacy failures in multi-step agent workflows by adding a context-aware instructor model that guides tool use.

  • The paper introduces Contextualized Defense Instructing (CDI) for privacy in LLM agents.
  • CDI inserts a step-specific instructor model that generates privacy guidance during execution rather than only blocking outputs.
  • Training uses reinforcement learning on failure trajectories that include privacy violations.
  • The authors formalize intervention points in a canonical agent loop to compare baseline defenses with CDI.
  • Results report 94.2% privacy preservation with 80.6% helpfulness in their evaluation framework.
  • CDI shows better robustness under adversarial conditions compared to static prompts or guards.
  • The study frames privacy as a dynamic, contextual decision across multi-step tool use.

Why it matters

  • Most real agents handle sensitive data across steps, where one bad action can leak private info.
  • Static safety prompts don’t adapt to changing context during tool calls.
  • Privacy-preserving automation is a prerequisite for enterprise-grade agent deployments.

What to do

  • Model privacy as a runtime control: add step-aware checks instead of only output filters.
  • Log privacy decision points: capture when agents touch sensitive sources or credentials.
  • Benchmark tradeoffs: measure privacy vs. helpfulness in agent evaluations and red-team runs.

Sources