arXiv PinTrace — LLMs Systemically Pin Vulnerable Dependency Versions

AI relevance: AI coding agents that generate requirements.txt or pyproject.toml files routinely pin library versions carrying known critical CVEs — the bias is baked into training data, meaning switching to a "better" model does not fix the problem.

  • Wang et al. evaluated 10 LLMs on PinTrace, a 1,000-task Python benchmark drawn from Stack Overflow, instrumenting every generated dependency specification against the National Vulnerability Database (arXiv:2605.06279).
  • 36.70%–55.70% of generated tasks include at least one library at a version with a known CVE.
  • All ten models converge on the same small set of risky releases — the failure is systemic, caused by co-occurrence bias in the training corpus, not per-model behavior.
  • Of pinned versions that carry CVEs, 62.75%–74.51% are rated Critical or High severity.
  • 72.27%–91.37% of vulnerable versions were disclosed before the model's training cutoff — the models had access to the information but selected vulnerable versions anyway due to corpus frequency bias.
  • Manifest files (requirements.txt, pyproject.toml) receive version specifications least often (6.45%–59.19% rate), yet this is the surface that controls reproducible installs.
  • Static install success rates for model-pinned versions range from 19.70%–63.20%, with functional test pass rates as low as 6.49%–48.62% — vulnerable versions that do install silently introduce CVEs, while broken versions at least self-correct.

Why it matters

AI coding agents are increasingly used to scaffold projects and generate dependency manifests. When 37–56% of agent-generated code pins CVE-carrying versions, and the fix cannot be achieved by switching models, organizations need infrastructure-level guards — not better prompts. The CVE feed has no signal path into the model's co-occurrence prior.

What to do

  • Run pip-audit, npm audit, or Dependabot security updates as a blocking CI gate on any manifest generated by an AI agent.
  • Use an internal package mirror (Artifactory, Nexus) that blocks known-vulnerable versions at install time.
  • Pair agent output with automated version bumping via Renovate or Dependabot so safe-at-merge versions stay current.
  • Lock then resolve: pipe agent-generated requirements.txt through pip-compile, uv lock, or poetry lock in a clean environment before merging.

Sources