arXiv — SENTINEL: securing AI agents in cyber-physical systems against deepfake and MCP-mediated attacks

• Category: Research

  • New survey paper (arXiv 2601.20184, Jan 28 2026) provides a comprehensive review of security threats targeting AI agents deployed in cyber-physical systems (CPS) — power grids, industrial control, autonomous vehicles, and smart infrastructure where agents interact with physical environments.
  • SENTINEL framework: the paper introduces a lifecycle-aware security methodology with four phases — threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation — designed specifically for environments where false positives have physical-safety consequences.
  • Deepfake and semantic manipulation: generative AI now enables attacks that compromise agent perception and reasoning through synthetic sensor data, manipulated voice/video inputs, and semantically crafted documents — going beyond traditional network-layer attacks to target the agent's understanding of its physical environment.
  • MCP as expanded attack surface: the paper identifies Model Context Protocol's dynamic tool-use and cross-domain context sharing as a significant new vector — MCP allows agents to connect to external data sources, models, and APIs, and each connection is a trust boundary that adversaries can exploit.
  • Multi-agent attack scenarios: the survey documents how untrusted content and peer-agent messages can induce unsafe tool calls and system-level compromise when trust boundaries between cooperating agents are weak — particularly dangerous in CPS where multiple agents coordinate physical processes.
  • Detection alone is insufficient: a key finding from the smart grid case study is that detection mechanisms cannot serve as decision authorities in safety-critical CPS because timing constraints, environmental noise, and false-positive costs make reactive-only defenses inadequate.
  • The paper advocates for provenance- and physics-grounded trust mechanisms — using physical-world invariants (sensor physics, timing constraints, known process dynamics) as anchors that AI-generated or AI-mediated data must satisfy before being acted upon.
  • Defense-in-depth architecture: rather than relying on any single layer, the survey recommends layered defenses combining input validation, reasoning-trace auditing, tool-call authorization, and physical-world consistency checks.

Why it matters

  • AI agents are moving from pure-software environments (chatbots, coding assistants) into CPS where compromised reasoning has physical consequences — a manipulated grid-management agent can cause equipment damage or outages.
  • The SENTINEL framework provides a structured methodology that security teams can adopt for threat modeling AI agent deployments in critical infrastructure, filling a gap left by software-focused frameworks like OWASP's Agentic Top 10.
  • The finding that detection-only approaches are insufficient in CPS directly challenges the "monitor and alert" posture many organizations default to when deploying agents alongside physical systems.

What to do

  • Assess CPS agent deployments against SENTINEL: map your AI agents' environmental interactions, tool bindings, and inter-agent communication against the framework's threat characterization phase.
  • Implement physics-grounded validation: for agents operating in physical environments, enforce consistency checks against known process dynamics — an agent commanding a valve change should validate against pressure/flow models before executing.
  • Harden MCP trust boundaries: treat every MCP connection as a potential injection vector; authenticate tool servers, validate response schemas, and restrict cross-domain context sharing to explicitly allowed sources.
  • Plan for false-positive costs: in safety-critical deployments, design defense thresholds around the cost of false positives (unnecessary shutdowns, delayed operations) rather than optimizing purely for detection rate.

Links