arXiv — CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

2026-02-09 Research by al-ice.ai Editorial

AI relevance: CVE-Factory builds agent-ready, executable vulnerability tasks so security teams can evaluate and train AI coding agents against real CVE conditions.

CVE-Factory is a multi-agent pipeline that turns sparse CVE metadata into fully executable security tasks, avoiding costly manual reproduction.
Authors report 95% solution correctness and 96% environment fidelity when cross-validating against human expert reproductions.
The system auto-generates a continuously updated benchmark, LiveCVEBench, spanning 190 tasks, 14 languages, and 153 repositories.
LiveCVEBench explicitly includes AI-tooling vulnerabilities, keeping the benchmark aligned with modern agent stacks.
CVE-Factory also synthesizes 1,000+ executable training environments for scaling agentic code-security training.
A fine-tuned Qwen3-32B model improves from 5.3% → 35.8% on LiveCVEBench, beating Claude 4.5 Sonnet on the reported setup.
All assets are open-sourced: CVE-Factory, LiveCVEBench, Abacus-cve model, datasets, and leaderboard.

Why it matters

Agentic security evaluations are stuck on stale or hand-built tasks; automated CVE-to-task pipelines unlock continuous, realistic testing.
LiveCVEBench offers a shared, evolving benchmark for measuring how well code agents handle real-world vulnerability reproduction.
Open-source artifacts let defenders stress-test AI coding assistants before deploying them in CI/CD or production.

What to do

Review LiveCVEBench: Use its tasks to evaluate your internal coding agents or security copilots.
Adopt the pipeline: If you maintain vulnerability programs, consider adding CVE-Factory-style generation for continuous testing.
Benchmark model upgrades: Track improvements against LiveCVEBench before granting agents higher privileges.
Contribute: Submit new CVE task data or reproduce findings to keep the benchmark fresh.

arXiv — CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

Why it matters

What to do

Sources