LangChain HTMLHeaderTextSplitter SSRF Redirect Bypass (CVE-2026-41481)
AI relevance: LangChain is the dominant agent and RAG framework, and this SSRF bypass in a widely-used text-splitting component affects any pipeline that ingests web URLs through HTMLHeaderTextSplitter — potentially exposing internal services, localhost APIs, or cloud metadata endpoints to an attacker who controls the initial URL.
What happened
- CVE-2026-41481 (CVSS 6.5, Medium) disclosed April 24, 2026.
- Affects
langchain-text-splittersprior to version 1.1.2. HTMLHeaderTextSplitter.split_text_from_url()validates the initial URL viavalidate_safe_url(), but then fetches content usingrequests.get()with redirects enabled by default.- Redirect targets are not re-validated, so an attacker-controlled server can redirect to
localhost, internal network services, or cloud metadata endpoints (e.g.,169.254.169.254). - The response body is parsed and returned as
Documentobjects to the calling application. - If the application exposes Document contents (or derivatives) back to the user who supplied the URL, this becomes a data-exfiltration path for internal endpoint data.
- Applications that process Documents internally without returning raw content are not directly exposed to exfiltration, but may still ingest unintended internal data.
- CWE-918 (Server-Side Request Forgery).
- Fixed in
langchain-text-splitters1.1.2.
Why it matters
- Many RAG pipelines use
HTMLHeaderTextSplitterto ingest web content — any pipeline accepting user-supplied URLs is at risk. - Classic SSRF redirect-bypass pattern, but with AI-specific amplification: the parsed Document contents flow into the agent's context window, potentially influencing downstream tool calls or responses.
- Cloud deployments are especially exposed — metadata endpoint access can reveal IAM credentials, instance roles, and secrets.
- Part of a broader wave of LangChain CVEs (CVE-2026-41488 also disclosed same day, though low severity).
What to do
- Upgrade
langchain-text-splittersto 1.1.2 or later immediately. - Audit RAG pipelines that accept user-supplied URLs through HTML-based splitters.
- Consider running URL ingestion behind an egress proxy that blocks requests to RFC-1918 ranges and cloud metadata IPs.
- Review whether Document contents are ever returned to the URL-supplying user — if so, treat this as a direct data-exfiltration risk.