Paubox — Invisible text prompt injection bypasses AI email filters
AI relevance: Phishing campaigns are weaponizing indirect prompt injection through invisible text layers in email HTML to defeat AI-powered email security filters — a structural failure mode where the scanner processes content the human recipient never sees.
- Researchers identified active phishing campaigns embedding invisible text in emails by setting font size to zero or matching text color to background, making content unreadable to humans but fully visible to AI models scanning the message.
- The hidden text is drawn from high-reputation sources — real brand newsletters, fiction websites, archived brand emails — flooding the email with benign signals designed to dilute malicious content and shift AI filter classification toward safe.
- Two campaigns detected: one used cloned Adidas newsletter content to disguise a cloud storage scam; another used a fake health insurance email padded with embedded fiction to impersonate a legitimate content platform.
- The technique currently accounts for less than 1% of observed phishing traffic, but researchers warn it signals a direction of travel as AI tools take on more autonomous email security roles.
- Unlike traditional prompt injection that forces an AI to act outside its design, these attacks influence the AI to make incorrect decisions within its normal operating parameters — a harder detection problem.
- Google Gemini's July 2025 zero-font prompt injection flaw (which allowed attackers to hide directives causing Gemini to follow instructions when summarizing emails) established the same technique months before these phishing campaigns.
- OpenAI's CISO acknowledged that "prompt injection remains a frontier, unsolved security problem" — highlighting the gap between detection ambition and technical reality.
Why it matters
AI email security tools were marketed as the next generation of phishing defense. This technique demonstrates that the very mechanism that makes them better — holistic content analysis — creates a new attack surface. The gap between what the AI processes (full HTML source) and what the human sees (rendered output) is a fundamental architectural problem. If even 1% of phishing traffic uses this method and AI filters are increasingly deployed as primary controls, the absolute volume of bypassed messages grows with adoption.
What to do
- Strip or sanitize zero-font and color-matched hidden text from emails before AI scanning — render-only content should not reach the model.
- Require AI email filters to weight visible content (what the human sees) higher than hidden HTML elements when making classification decisions.
- Monitor for emails with suspicious HTML structure: disproportionately large hidden text blocks relative to visible content.
- Layer AI email security with traditional signals (sender reputation, link analysis, DKIM/SPF/DMARC) rather than relying on content analysis alone.
- Treat this as a design problem for AI security vendors: the scanner must assess full message context, not surface-level links or keywords in isolation.