LangChain HTMLHeaderTextSplitter SSRF Redirect Bypass (CVE-2026-41481)

AI relevance: LangChain is the dominant agent and RAG framework, and this SSRF bypass in a widely-used text-splitting component affects any pipeline that ingests web URLs through HTMLHeaderTextSplitter — potentially exposing internal services, localhost APIs, or cloud metadata endpoints to an attacker who controls the initial URL.

What happened

  • CVE-2026-41481 (CVSS 6.5, Medium) disclosed April 24, 2026.
  • Affects langchain-text-splitters prior to version 1.1.2.
  • HTMLHeaderTextSplitter.split_text_from_url() validates the initial URL via validate_safe_url(), but then fetches content using requests.get() with redirects enabled by default.
  • Redirect targets are not re-validated, so an attacker-controlled server can redirect to localhost, internal network services, or cloud metadata endpoints (e.g., 169.254.169.254).
  • The response body is parsed and returned as Document objects to the calling application.
  • If the application exposes Document contents (or derivatives) back to the user who supplied the URL, this becomes a data-exfiltration path for internal endpoint data.
  • Applications that process Documents internally without returning raw content are not directly exposed to exfiltration, but may still ingest unintended internal data.
  • CWE-918 (Server-Side Request Forgery).
  • Fixed in langchain-text-splitters 1.1.2.

Why it matters

  • Many RAG pipelines use HTMLHeaderTextSplitter to ingest web content — any pipeline accepting user-supplied URLs is at risk.
  • Classic SSRF redirect-bypass pattern, but with AI-specific amplification: the parsed Document contents flow into the agent's context window, potentially influencing downstream tool calls or responses.
  • Cloud deployments are especially exposed — metadata endpoint access can reveal IAM credentials, instance roles, and secrets.
  • Part of a broader wave of LangChain CVEs (CVE-2026-41488 also disclosed same day, though low severity).

What to do

  • Upgrade langchain-text-splitters to 1.1.2 or later immediately.
  • Audit RAG pipelines that accept user-supplied URLs through HTML-based splitters.
  • Consider running URL ingestion behind an egress proxy that blocks requests to RFC-1918 ranges and cloud metadata IPs.
  • Review whether Document contents are ever returned to the URL-supplying user — if so, treat this as a direct data-exfiltration risk.

Sources