RAGShield — Numerical Claim Manipulation in RAG Systems Evades Embedding Defenses

AI relevance: Any RAG pipeline that retrieves and presents numerical data — financial records, legal documents, scientific results, or operational metrics — is vulnerable to silent numerical manipulation that embedding-based similarity checks cannot detect.

Key Findings

  • Embedding blind spot proven: changing a tax deduction figure by $50,000 produces a cosine similarity of 0.9998 — effectively invisible to every known embedding-based detection threshold.
  • 1,459× sensitivity gap: across 174 manipulation pairs and two embedding models, the mean distance change for numerical alterations was 1,459 times smaller than for semantically equivalent text changes.
  • Existing defenses miss 79-90%: embedding-based RAG defenses failed to detect the vast majority of numerical manipulation attacks tested on real IRS document content.
  • RAGShield detects 100%: the proposed defense operates on extracted values directly — a pattern-based engine identifies dollar amounts and percentages, links each value to its governing entity via two-pass context propagation (99.8% entity detection on 2,742 IRS passages), and verifies claims against a cross-source registry.
  • Root cause: text embeddings encode topic and semantics, not numerical precision. Two passages that differ only in a single number are near-identical in embedding space.
  • Temporal tracking: RAGShield adds a temporal tracker that flags value changes falling outside known government update schedules, catching gradual manipulation campaigns.

Why It Matters

This isn't limited to government RAG systems. Any enterprise RAG deployment that surfaces numerical data — financial reports, pricing, engineering specifications, medical dosages, legal thresholds — faces the same vulnerability. An attacker who can inject or swap a single document in a knowledge base can change critical numbers while remaining undetected by standard retrieval similarity checks. The PoisonedRAG paper (USENIX Security 2025) showed knowledge poisoning is practical with just 5 injected texts; RAGShield reveals that even when you defend against content poisoning, numerical manipulation slips through.

What to Do

  • Extract and verify numbers: For RAG systems handling numerical data, parse out dollar amounts, percentages, and quantities from retrieved passages and cross-reference against trusted source documents.
  • Don't rely on embedding similarity alone: cosine similarity cannot catch numerical changes. Add numerical-aware validation as a separate pipeline stage.
  • Multi-source corroboration: When critical numbers appear in RAG outputs, require corroboration from at least two independent sources in the knowledge base.
  • Track value changes over time: Maintain a registry of known values and flag deviations that don't correspond to legitimate document updates.

Sources