Microsoft — turning threat reports into detection insights with AI
• Category: Security
- Problem: turning long, messy threat intel / red-team reports into usable detection engineering work is slow (days to weeks), and it’s easy to miss details.
- Workflow idea: use an LLM to extract candidate TTPs + metadata from reports, then normalize and map them to MITRE ATT&CK.
- Coverage step: compare extracted TTPs against your existing detection catalog to label each item as “likely covered” vs “likely gap.”
- How they reduce false positives: combine vector similarity search (to shortlist candidate detections) with LLM-based validation (to check whether the mapping is actually plausible).
- Context preservation: ingestion keeps document structure (headings/lists/etc.) because where a detail appears can change how it should be interpreted.
- Output: a prioritized list of detection opportunities + likely gaps — explicitly framed as a starting point, not an auto-ship decision.
- Explicit guardrails: use structured outputs (schemas), deterministic prompts for critical steps, and reviewer checkpoints for “coverage vs gap” conclusions.
Why it matters
- High-signal reports pile up during active campaigns; anything that cuts “time-to-first-detection-draft” without lowering rigor helps.
- This is a practical pattern for security automation: LLMs for extraction + normalization, classic IR/IRL tools for retrieval, and humans for final validation.
- It also highlights a non-obvious failure mode: “coverage” inferred from text similarity can be wrong when telemetry isn’t present, scope differs, or correlation logic is missing — so you need explicit validation loops.
What to do
- Build a detection catalog you can query: standardize fields (title, description, ATT&CK mappings, code/query language, required telemetry) and make it searchable.
- Automate TTP extraction as a first pass: run LLM extraction into a strict schema (technique, evidence snippet, confidence, telemetry needed).
- Do two-stage matching: vector search to shortlist candidate detections, then an LLM (or rule-based checks) to validate whether the match really covers the behavior.
- Gate “likely gaps” with real evidence: simulate or replay telemetry (where possible) before investing deeply in new detections.
- Measure drift: keep a small gold set of reports + expected TTPs/mappings so prompt/model changes don’t silently degrade quality.
Sources
- Microsoft Security Blog: Turning threat reports into detection insights with AI
- MITRE ATT&CK: https://attack.mitre.org/
- Referenced paper (ACSAC 2025): Towards Autonomous Detection Engineering