T-MAP — Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
AI relevance: T-MAP is the first red-teaming framework to use full execution trajectories — not just final outputs — to discover multi-step attack paths against LLM agents with tool access, specifically evaluated in MCP environments where agents interact with external systems.
What happened
- Researchers from KAIST and collaborators published "T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search" on arXiv (arXiv:2603.22341).
- The framework uses execution trajectories to guide evolutionary search for adversarial prompts, rather than evaluating only the final model output — capturing the full multi-step tool-use chain that agents execute.
- T-MAP achieved a 57.8% average attack realization rate (ARR) across diverse MCP environments, substantially outperforming prior baselines that only optimize for harmful text generation.
- Attacks remained effective against frontier models including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, demonstrating that current safety guardrails are insufficient for agentic tool-use scenarios.
- The trajectory-aware approach discovers attack paths that single-step prompt injection methods miss, because it evaluates whether the agent's tool calls actually produce the harmful objective — not just whether the model generates harmful text.
- The research highlights a critical gap: agents with tool access (MCP servers, function calling, code execution) face qualitatively different risks than text-only LLMs, since successful attacks cause real-world actions rather than just harmful outputs.
Why it matters
As organizations deploy LLM agents with MCP tool access for production tasks — database queries, API calls, code execution, file operations — the attack surface expands far beyond what traditional prompt injection defenses cover. T-MAP shows that over half of attack attempts succeed against frontier models when the evaluation includes actual tool execution. This means current red-teaming and safety evaluation practices, which focus on text outputs, may dramatically underestimate agentic risk.
What to do
- Adopt trajectory-based red-teaming: Evaluate agent safety by measuring whether adversarial inputs lead to harmful tool-use outcomes, not just harmful text generation.
- Harden MCP tool permissions: Apply least-privilege principles to every tool an agent can call — assume any prompt input path may be adversarial.
- Monitor tool-use sequences: Implement detection for unusual multi-step tool invocation patterns that may indicate exploitation in progress.
- Test against trajectory-aware attacks: Use frameworks like T-MAP to proactively discover agent vulnerabilities before deployment.