Google GTIG identified the first AI-developed zero-day in the wild. And the code left its signature. Per the report published May 11, criminal actors used AI to discover and weaponize a vulnerability in a popular open-source system administration tool. The exploit was a 2FA bypass via Python script. The root cause: a semantic logic flaw (hardcoded trust assumption) — the type of vulnerability LLMs find well and traditional fuzzers cannot. We have been tracking this evolution closely, and the picture is clear: the attack surface just expanded to entire classes that our traditional tools do not cover.
The AI signature in exploit code
Google GTIG determined the exploit was developed with AI support through four unmistakable markers. First, abundant educational docstrings — no human attacker writes didactic comments in exploit code. Second, hallucinated CVSS: the script contained a fabricated severity score that does not exist in any database. The model invented the number. Third, structured textbook Pythonic format — clean, annotated code matching LLM training data style. Fourth, detailed help menus and polished ANSI color classes — excessively refined exploit code for a criminal purpose.
Google stated "high confidence that the actor used an AI model to assist in the discovery and weaponization of this vulnerability." It was not Gemini. It was not Mythos. Unidentified commercial model. The plan was mass exploitation — Google discovered it proactively before the large-scale exploitation campaign launched. The vendor patched. Errors in the exploit implementation likely interfered with the criminals' success. But the fact that the attack did not reach scale does not diminish the milestone: AI is already discovering and weaponizing vulnerabilities that humans and traditional tools cannot find.
The vulnerability class that changes the game
Here is the point that matters for anyone defending infrastructure: the type of bug AI finds is different from what fuzzers find. The mirror is revealing. In October 2024, Big Sleep (Project Zero + DeepMind) found a zero-day in SQLite — defensive AI finding vulnerabilities to fix. Now, per Google GTIG, offensive AI finding vulnerabilities to exploit. Same capability, opposite intent.
But the vulnerability class difference matters. Big Sleep found a memory safety bug — stack buffer underflow in SQLite. The criminal zero-day is a semantic logic flaw. LLMs reason about developer intent. Fuzzers do not. This expands the attack surface to entire classes that traditional tools do not cover. A fuzzer tests memory boundaries and malformed inputs. An LLM reads the code, understands what the developer intended, and finds where the logic broke. It is a fundamentally different vulnerability category.
For us, running offensive security daily, this distinction is practical, not theoretical. When we audit an application, semantic logic flaws — authorization bypasses, incorrect trust assumptions, broken authentication flows — are exactly the ones that require human reasoning. Until now, they were the hardest to find automatically. Now, an LLM with access to source code can identify them in minutes.
Time compression and the APT context
The numbers from Cogent Research (May 27) are stark: the time from CVE disclosure to functional exploit dropped from 125.3 days (January 2025) to 0.5 days (April 2026). 62% of critical CVEs had exploits before scanner detection. That is known vulnerabilities. The Google GTIG zero-day is the next level: AI finding unknown vulnerabilities.
And the context from the same report shows APT adoption of AI is broad. Per Google GTIG, APT45 (North Korea) sends thousands of prompts to analyze CVEs and validate PoCs. UNC2814 (China) uses persona-based jailbreaks for vulnerability research on embedded devices. APT27 (China) uses Gemini for ORB network management. PROMPTFLUX/HONESTCUE uses the Gemini API for dynamic obfuscation. This is not an isolated actor — it is an ecosystem.
Per Hultquist of GTIG: "There is a misconception that the AI vulnerability race is imminent. The reality is that it has already started."
We see this time compression in practice. One of our clients had a critical CVE exploited within 24 hours of disclosure this year. When the window is measured in hours, the manual process of assessment, prioritization, and patching cannot keep up. Automation is no longer aspirational — it is a survival requirement.
What this means for corporate defense
The race is no longer whether AI zero-days will become common. It is how fast. And the answer is not more fuzzing — it is semantic analysis. We run offensive security for corporate clients, and what we see in the field confirms it: the tools that protected infrastructure yesterday do not cover the vulnerability classes AI discovers today.
Traditional static analysis tools look for known patterns. Fuzzers test memory boundaries. Neither models developer intent. When an LLM reads authentication code and identifies that the developer assumed hardcoded trust instead of validating explicitly, that is reasoning about semantics — not syntax.
At Tech86, our offensive security practice simulates exactly this type of reasoning. We do not rely solely on fuzzers and scanners — our pentest process includes semantic analysis of authorization logic, hardcoded trust assumptions, and threat model inconsistencies. When the attacker uses AI to find logic flaws, the defense needs the same class of reasoning to anticipate them.
