WithSecure documented the first forensically documented case of a threat group using commercial AI across every phase of the kill chain. The group is GREYVIBE. Russia-aligned, active against Ukraine since August 2025. And what makes this case different is not the presence of AI — it is the fact that the AI is not experimental. It is operational. ChatGPT, Gemini, and Ideogram integrated into every phase of the attack.
The AI-generated kill chain
According to WithSecure, every phase of GREYVIBE's operations carries the signature of commercial AI. Visual lures were generated by Ideogram — the PrincessClub and PhantomClick campaigns used synthetic images with identifiable LLM watermarks. Phishing content in Ukrainian was produced by ChatGPT and Gemini. Obfuscation scripts — LOOKVALPS, LOOKVALJS, DAYLIGHT, and TEASOUP — were developed with AI assistance.
The central malware, LegionRelay, is a PowerShell RAT that WithSecure assessed as "substantially or entirely developed with AI coding assistance." The C2 backend, post-compromise commands, and infrastructure tooling: all AI-generated. This is not a group using AI to write better emails. This is a group using AI to build the entire offensive chain.
According to CiphersSecurity, in their analysis of the WithSecure report, this is "the first forensically documented case of AI-generated lures and AI-coded malware in the active operational pipeline of a single threat cluster." Previously, reports relied on prompt monitoring. Here, the artifacts are in the code and the images.
Five simultaneous campaigns, one operational model
According to WithSecure, GREYVIBE operated five simultaneous campaigns against Ukrainian and European targets:
- PhantomMail: spear-phishing via Google Drive and 4sync, targeting military and government personnel
- PhantomClick: ClickFix technique with fake CAPTCHA to engage victims in executing malicious code
- PrincessClub: fake adult club sites with live WebRTC streaming as lure
- DroneLink: fake military charity sites soliciting FPV drone donations — exploiting the Ukrainian war effort
- Nebo: mimicry of Russian military login to deceive Ukrainian service members and capture credentials
Targets concentrate on Ukrainian military in the Kharkiv region, but also include government, energy, telecom, and emergency services. European organizations were also hit. The diversity of vectors — spear-phishing, social engineering, war-themed lures, credential harvesting — is unusual for a group that WithSecure classifies as low-to-moderate sophistication.
The defensive irony: the AI that enabled also betrayed
The AI-generated LegionRelay had design flaws that gave WithSecure months of visibility into the group's operations. AI-generated code without human review introduced exploitable imperfections. LLM watermarks in phishing images provided direct forensic evidence. AI-generated code patterns in malware gave defenders sustained visibility.
WithSecure documented significant OpSec failures: submission of development artifacts to malware analysis platforms before operations, running cryptocurrency miners on victim machines (possible parallel financial motivation), naming artifacts with internet slang like "letsrollboyos," "totallyunsus," and "cuteuwu." This is operational ambition with amateur technique — a group that can sustain 5 simultaneous campaigns but cannot clean its digital tracks.
The implication is clear: the AI that lowered the entry floor also lowered the OpSec ceiling. A low-sophistication group produced output it could not achieve without AI — but the same tools that amplified its capability also left identifiable signatures for defenders.
What this means for cyber defense
According to WithSecure, AI does not make good attackers faster. It makes mediocre attackers dangerous. A group classified as low-to-moderate sophistication sustained 5 simultaneous campaigns with custom malware for Windows, Android, and web. That is output that previously required an APT with significant resources.
But the same dynamic that democratized attack also created exploitable vulnerabilities. LLM watermarks are identifiable. AI-generated code patterns are detectable. OpSec failures are more visible when an attacker scales beyond their native capability.
At Tech86, we build this logic into our managed EDR: we do not just look for known signatures, but for behavioral patterns that indicate AI-generated code, OpSec anomalies, and activity that scales beyond the expected sophistication of the attacker. When the entry floor drops, behavioral detection becomes the most important line of defense — and the flaws that AI introduces into the attacker's code are exactly what our monitoring exploits.
