LangChain HTMLHeaderTextSplitter SSRF Redirect Bypass (CVE-2026-41481)

2026-04-25 Security by al-ice.ai Editorial

AI relevance: LangChain is the dominant agent and RAG framework, and this SSRF bypass in a widely-used text-splitting component affects any pipeline that ingests web URLs through HTMLHeaderTextSplitter — potentially exposing internal services, localhost APIs, or cloud metadata endpoints to an attacker who controls the initial URL.

What happened

CVE-2026-41481 (CVSS 6.5, Medium) disclosed April 24, 2026.
Affects langchain-text-splitters prior to version 1.1.2.
HTMLHeaderTextSplitter.split_text_from_url() validates the initial URL via validate_safe_url(), but then fetches content using requests.get() with redirects enabled by default.
Redirect targets are not re-validated, so an attacker-controlled server can redirect to localhost, internal network services, or cloud metadata endpoints (e.g., 169.254.169.254).
The response body is parsed and returned as Document objects to the calling application.
If the application exposes Document contents (or derivatives) back to the user who supplied the URL, this becomes a data-exfiltration path for internal endpoint data.
Applications that process Documents internally without returning raw content are not directly exposed to exfiltration, but may still ingest unintended internal data.
CWE-918 (Server-Side Request Forgery).
Fixed in langchain-text-splitters 1.1.2.

Why it matters

Many RAG pipelines use HTMLHeaderTextSplitter to ingest web content — any pipeline accepting user-supplied URLs is at risk.
Classic SSRF redirect-bypass pattern, but with AI-specific amplification: the parsed Document contents flow into the agent's context window, potentially influencing downstream tool calls or responses.
Cloud deployments are especially exposed — metadata endpoint access can reveal IAM credentials, instance roles, and secrets.
Part of a broader wave of LangChain CVEs (CVE-2026-41488 also disclosed same day, though low severity).

What to do

Upgrade langchain-text-splitters to 1.1.2 or later immediately.
Audit RAG pipelines that accept user-supplied URLs through HTML-based splitters.
Consider running URL ingestion behind an egress proxy that blocks requests to RFC-1918 ranges and cloud metadata IPs.
Review whether Document contents are ever returned to the URL-supplying user — if so, treat this as a direct data-exfiltration risk.

LangChain HTMLHeaderTextSplitter SSRF Redirect Bypass (CVE-2026-41481)

What happened

Why it matters

What to do

Sources