What is an AI-developed zero-day and why does it matter?

Per Google GTIG, this is the first time a criminal actor used AI to discover and weaponize an unknown vulnerability (zero-day) in the wild. The exploit was a 2FA bypass via Python script targeting a popular open-source system administration tool. The root cause: a semantic logic flaw (hardcoded trust assumption) — the type of vulnerability LLMs find well and traditional fuzzers cannot. This matters because it expands the attack surface to entire classes of bugs that traditional tools do not cover.

How did Google determine the exploit was AI-developed?

Per Google GTIG, four signatures indicated AI origin: (1) abundant educational docstrings — no human attacker writes didactic comments in exploit code; (2) hallucinated CVSS — the script contained a fabricated severity score that does not exist in any database, the model invented the number; (3) structured textbook Pythonic format — clean, annotated code matching LLM training data style; (4) detailed help menus and polished ANSI color classes — excessively refined exploit code. Google stated "high confidence that the actor used an AI model to assist in the discovery and weaponization of this vulnerability."

What is the difference between the criminal zero-day and the Big Sleep zero-day?

In October 2024, Big Sleep (Project Zero + DeepMind) found a zero-day in SQLite — defensive AI finding vulnerabilities to fix. The criminal zero-day identified by Google GTIG is offensive AI finding vulnerabilities to exploit. The class difference matters: Big Sleep found a memory safety bug (stack buffer underflow in SQLite). The criminal zero-day is a semantic logic flaw. LLMs reason about developer intent. Fuzzers do not. Same capability, opposite intent.

Google GTIG: The First AI-Developed Zero-Day in the Wild

Q: How fast is the exploitation timeline compressing?

Per Cogent Research (May 27), the time from CVE disclosure to functional exploit dropped from 125.3 days (January 2025) to 0.5 days (April 2026). 62% of critical CVEs had exploits before scanner detection. That is known vulnerabilities. The Google GTIG zero-day is the next level: AI finding unknown vulnerabilities. The race is no longer whether AI zero-days will become common — it is how fast.

Google GTIG identified the first AI-developed zero-day in the wild. And the code left its signature. Per the report published May 11, criminal actors used AI to discover and weaponize a vulnerability in a popular open-source system administration tool. The exploit was a 2FA bypass via Python script. The root cause: a semantic logic flaw (hardcoded trust assumption) — the type of vulnerability LLMs find well and traditional fuzzers cannot. We have been tracking this evolution closely, and the picture is clear: the attack surface just expanded to entire classes that our traditional tools do not cover.

The AI signature in exploit code

Google GTIG determined the exploit was developed with AI support through four unmistakable markers. First, abundant educational docstrings — no human attacker writes didactic comments in exploit code. Second, hallucinated CVSS: the script contained a fabricated severity score that does not exist in any database. The model invented the number. Third, structured textbook Pythonic format — clean, annotated code matching LLM training data style. Fourth, detailed help menus and polished ANSI color classes — excessively refined exploit code for a criminal purpose.

Google stated "high confidence that the actor used an AI model to assist in the discovery and weaponization of this vulnerability." It was not Gemini. It was not Mythos. Unidentified commercial model. The plan was mass exploitation — Google discovered it proactively before the large-scale exploitation campaign launched. The vendor patched. Errors in the exploit implementation likely interfered with the criminals' success. But the fact that the attack did not reach scale does not diminish the milestone: AI is already discovering and weaponizing vulnerabilities that humans and traditional tools cannot find.

The vulnerability class that changes the game

Here is the point that matters for anyone defending infrastructure: the type of bug AI finds is different from what fuzzers find. The mirror is revealing. In October 2024, Big Sleep (Project Zero + DeepMind) found a zero-day in SQLite — defensive AI finding vulnerabilities to fix. Now, per Google GTIG, offensive AI finding vulnerabilities to exploit. Same capability, opposite intent.

But the vulnerability class difference matters. Big Sleep found a memory safety bug — stack buffer underflow in SQLite. The criminal zero-day is a semantic logic flaw. LLMs reason about developer intent. Fuzzers do not. This expands the attack surface to entire classes that traditional tools do not cover. A fuzzer tests memory boundaries and malformed inputs. An LLM reads the code, understands what the developer intended, and finds where the logic broke. It is a fundamentally different vulnerability category.

For us, running offensive security daily, this distinction is practical, not theoretical. When we audit an application, semantic logic flaws — authorization bypasses, incorrect trust assumptions, broken authentication flows — are exactly the ones that require human reasoning. Until now, they were the hardest to find automatically. Now, an LLM with access to source code can identify them in minutes.

Time compression and the APT context

The numbers from Cogent Research (May 27) are stark: the time from CVE disclosure to functional exploit dropped from 125.3 days (January 2025) to 0.5 days (April 2026). 62% of critical CVEs had exploits before scanner detection. That is known vulnerabilities. The Google GTIG zero-day is the next level: AI finding unknown vulnerabilities.

And the context from the same report shows APT adoption of AI is broad. Per Google GTIG, APT45 (North Korea) sends thousands of prompts to analyze CVEs and validate PoCs. UNC2814 (China) uses persona-based jailbreaks for vulnerability research on embedded devices. APT27 (China) uses Gemini for ORB network management. PROMPTFLUX/HONESTCUE uses the Gemini API for dynamic obfuscation. This is not an isolated actor — it is an ecosystem.

Per Hultquist of GTIG: "There is a misconception that the AI vulnerability race is imminent. The reality is that it has already started."

We see this time compression in practice. One of our clients had a critical CVE exploited within 24 hours of disclosure this year. When the window is measured in hours, the manual process of assessment, prioritization, and patching cannot keep up. Automation is no longer aspirational — it is a survival requirement.

What this means for corporate defense

The race is no longer whether AI zero-days will become common. It is how fast. And the answer is not more fuzzing — it is semantic analysis. We run offensive security for corporate clients, and what we see in the field confirms it: the tools that protected infrastructure yesterday do not cover the vulnerability classes AI discovers today.

Traditional static analysis tools look for known patterns. Fuzzers test memory boundaries. Neither models developer intent. When an LLM reads authentication code and identifies that the developer assumed hardcoded trust instead of validating explicitly, that is reasoning about semantics — not syntax.

At Tech86, our offensive security practice simulates exactly this type of reasoning. We do not rely solely on fuzzers and scanners — our pentest process includes semantic analysis of authorization logic, hardcoded trust assumptions, and threat model inconsistencies. When the attacker uses AI to find logic flaws, the defense needs the same class of reasoning to anticipate them.

Google GTIG: The First AI-Developed Zero-Day in the Wild

The AI signature in exploit code

The vulnerability class that changes the game

Time compression and the APT context

What this means for corporate defense

Frequently Asked Questions

What is an AI-developed zero-day and why does it matter?

How did Google determine the exploit was AI-developed?

What is the difference between the criminal zero-day and the Big Sleep zero-day?

How fast is the exploitation timeline compressing?

Blog — Get in Touch

Schedule a Meeting

Email

WhatsApp

Address

Tech86 Specialist

We Value Your Privacy