arXiv — Contextualized privacy defense for LLM agents
AI relevance: The work targets privacy failures in multi-step agent workflows by adding a context-aware instructor model that guides tool use.
- The paper introduces Contextualized Defense Instructing (CDI) for privacy in LLM agents.
- CDI inserts a step-specific instructor model that generates privacy guidance during execution rather than only blocking outputs.
- Training uses reinforcement learning on failure trajectories that include privacy violations.
- The authors formalize intervention points in a canonical agent loop to compare baseline defenses with CDI.
- Results report 94.2% privacy preservation with 80.6% helpfulness in their evaluation framework.
- CDI shows better robustness under adversarial conditions compared to static prompts or guards.
- The study frames privacy as a dynamic, contextual decision across multi-step tool use.
Why it matters
- Most real agents handle sensitive data across steps, where one bad action can leak private info.
- Static safety prompts don’t adapt to changing context during tool calls.
- Privacy-preserving automation is a prerequisite for enterprise-grade agent deployments.
What to do
- Model privacy as a runtime control: add step-aware checks instead of only output filters.
- Log privacy decision points: capture when agents touch sensitive sources or credentials.
- Benchmark tradeoffs: measure privacy vs. helpfulness in agent evaluations and red-team runs.