Socket — Shai-Hulud Wave 2 Targets MCP and Bioinformatics PyPI Developers

AI relevance: The Shai-Hulud campaign now publishes packages explicitly themed around MCP (langchain-core-mcp, openai-mcp, tiktoken-mcp, ray-mcp-server) to directly target developers building Model Context Protocol integrations and AI agent toolchains.

Socket Threat Research has identified a second wave of the Shai-Hulud supply chain operation, adding 23 new malicious PyPI artifacts to an already extensive campaign. The total now spans 471 artifacts across npm and PyPI — 411 npm artifacts in 106 packages and 60 PyPI artifacts in 37 packages.

New delivery techniques

This wave introduces three distinct PyPI delivery branches that escalate detection evasion:

  • .pth startup-hook pattern: A malicious wheel bundles a *-setup.pth file alongside _index.js. The hook fires during Python startup, silently downloads the Bun JavaScript runtime, and executes the obfuscated stealer payload.
  • Native extension import trigger: Malicious code is embedded directly inside compiled .abi3.so extensions. The Python source appears clean, but the extension executes _index.js the moment Python loads the module via dlopen() — bypassing source-only review pipelines entirely.
  • langchain-core-mcp loader variant: The most novel technique. The wheel installs a .pth loader but ships without _index.js. Instead, it scans every entry in sys.path and one directory below each entry, searching for the payload elsewhere in the Python environment — creating a split-staging architecture that evades detection rules expecting loader and payload to coexist in the same wheel.

LLM anti-analysis

The _index.js payload embeds a large fake system-instruction block inside a non-executing JavaScript comment at the top of the file. The comment is skipped at runtime by Bun but is designed to trigger safety refusals, context pollution, and premature classification in AI-assisted triage pipelines. This is a novel technique specifically designed to poison automated security analysis tools powered by LLMs.

Targeted package categories

  • MCP/AI-themed: langchain-core-mcp, openai-mcp, instructor-mcp, tiktoken-mcp, ray-mcp-server — explicitly targeting MCP developers.
  • Bioinformatics trojans: embiggen, ensmallen, gpsea, phenopacket-store-toolkit — trojanized legitimate research tools with malicious code hidden in compiled native extensions.
  • Typosquats: rsquests (requests), tlask/rlask (Flask) — designed to capture developer installs via common typos.

Why it matters

The campaign operates continuously since June 1, pivoting delivery mechanisms every 48 to 72 hours. No CVEs exist for any artifact because these are supply chain attacks, not software vulnerabilities. The split-staging architecture (loader without payload in the same package) and native extension obfuscation represent a significant step up in sophistication for PyPI malware. The LLM anti-analysis technique is the first documented case of malware specifically designed to deceive automated AI security tools.

What to do

  • Block or remove all 23 newly identified PyPI artifacts (see IOCs in source).
  • Audit Python environments for .pth files referencing unknown *-setup.pth loaders.
  • Scan for unexpected .abi3.so native extensions in pure-Python packages.
  • Use package integrity verification tools (Socket, osv-scanner) that inspect actual wheel contents, not just source.
  • Be skeptical of AI-assisted security analysis results — the campaign now includes countermeasures against them.

Sources