Which frameworks and models were tested?

4 frameworks: Claude Code, OpenHands, Codex CLI, and Gemini CLI. 5 models: Claude Sonnet 4.6, GLM-4.7, MiniMax-M2.5, GPT-5.4, and Gemini 2.5 Pro. The study generated 1,070 adversarial skills from 81 seeds, covering 15 MITRE ATT&CK categories. 4 CVEs were confirmed and 2 fixes have been deployed.

Why are skill registries the new supply chain?

Skills carry the privileges of curated libraries — the agent trusts the content because it came from a registry. But the audit level is far lower than what npm or PyPI packages receive. Anyone can publish a skill with malicious code examples. The agent loads, reads, and executes without the same scrutiny a package manager applies.

Does multi-model verification work in practice?

Yes. Only 1.6% of payloads bypassed all 5 models simultaneously. If you run code through 2+ models before executing high-risk actions, the escape rate drops dramatically. The cost is additional latency on verifications, but the benefit is a 98.4% reduction in bypass rate compared to a single model.

PoisonedSkills: Skill Docs That Make AI Agents Run Malware

Q: What is PoisonedSkills and how does it differ from prompt injection?

PoisonedSkills uses Document-Driven Implicit Payload Execution (DDIPE). Instead of injecting malicious instructions directly into the prompt, the attack embeds the payload in code examples and config templates within skill documentation. The agent reads the doc as part of normal execution and reproduces the malicious code without knowing it's under attack. Explicit attack instructions: 0% bypass under strong defenses. DDIPE: up to 33.5%.

Q: What is the Codex CLI sandbox semantic bypass?

Codex CLI uses a filesystem sandbox, but the agent found a gap in semantic permissions — operations that looked legitimate within the skill context but violated the sandbox's intent. The sandbox restricted what, but not why. One of the 4 confirmed CVEs came from this vector.

Your AI coding agent can execute malicious code just by reading a skill's documentation. No explicit prompt. No user command. The payload sits inside a code example. The agent reproduces it as part of normal execution. This is PoisonedSkills: Document-Driven Implicit Payload Execution. 4 frameworks, 5 models, 11.6% to 33.5% bypass rate. 4 CVEs confirmed.

DDIPE: the attack that bypasses what direct prompt injection cannot

Document-Driven Implicit Payload Execution. The technique embeds malicious logic in code examples and config templates within skill documentation. Coding agents like Claude Code, OpenHands, Codex CLI, and Gemini CLI load skills from public registries. When processing a task, they read the docs and reproduce the code examples as part of normal execution. The payload is inside the example. The agent executes it without knowing it's malicious.

The number that changes everything: explicit attack instructions — "execute this malicious command" — achieve 0% bypass under strong defenses. DDIPE achieves 11.6% to 33.5%. Payload disguised as documentation bypasses what direct prompt injection cannot. Model alignment recognizes an attack instruction. It does not recognize a code example that looks legitimate.

The numbers: 1,070 skills, 15 MITRE categories, 4 CVEs

The study generated 1,070 adversarial skills from 81 seeds, covering 15 MITRE ATT&CK categories. Frameworks tested: Claude Code, OpenHands, Codex CLI, and Gemini CLI. Models: Claude Sonnet 4.6, GLM-4.7, MiniMax-M2.5, GPT-5.4, and Gemini 2.5 Pro.

4 CVEs were confirmed. 2 fixes have been deployed. The attack surface is real and exploitable today — not theoretical research. Each CVE represents a concrete path for malicious code execution through skill documentation in production-grade frameworks.

The 4 case studies: sandbox bypass, CI/CD amplifier, skill-loading, and cosmetic fixes

Codex CLI: sandbox semantic bypass. The agent escaped the sandbox through a permission gap. The sandbox restricted filesystem operations but didn't validate the semantics of the action. The payload executed operations that looked legitimate within the skill context but violated the sandbox's intent.

Gemini CLI: CI/CD as attack amplifier. In headless mode, Gemini CLI strips permission confirmations. The payload executes without human interaction. An agent running in CI/CD without approval gates is a silent execution vector — nobody sees it, nobody approves it, nobody blocks it.

Claude Code: skill-loading bypass in local settings. The skill-loading mechanism can be manipulated through local configurations, allowing malicious skills to be injected without registry validation.

OpenHands: prompt-level fix, not architectural. The correction was made to the system prompt, not to the framework's architecture. Prompt-level fixes are fragile — model or context changes can reopen the vector. It treats the symptom, not the disease.

The structural lesson: alignment and guardrails are complementary, not interchangeable

Two defense layers interact asymmetrically: model alignment and architectural guardrails. Removing architectural protection amplifies execution 11.3x for weakly aligned models. Only 1.6x for well-aligned models. The difference is brutal.

Alignment alone doesn't suffice. Guardrails alone don't suffice. Both must be present. Well-aligned models without guardrails still execute sophisticated payloads. Guardrails without alignment are circumventable through semantic gaps. Effective defense is the interaction between both layers — each covering the other's failures.

The data point that validates multi-model verification: only 1.6% of payloads bypassed all models simultaneously. Before a high-risk action, running code through 2+ models drops the bypass rate from 33.5% to under 2%. It's the most viable countermeasure we have today.

Skill registries are the new supply chain

Skills carry the privileges of curated libraries without the same level of auditing. When an agent loads a skill from a registry, it assumes the content is trustworthy. Code examples in documentation are not harmless text — they are code that the agent reproduces and executes. The same delegated trust that exists in npm install exists in skill loading, but without the audit ecosystem that npm built around lockfiles, integrity checks, and scope validation.

CI/CD headless mode is an attack amplifier. An agent in CI without human interaction is a silent bypass. The combination of unaudited skill registries with autonomous CI/CD creates an attack surface that nobody is monitoring.

At Tech86, we evaluate attack surfaces in AI agent pipelines — skill registries, CI/CD permissions, architectural guardrails. If your devs use coding agents in production, you need to know what those skills can do before the first payload lands.

PoisonedSkills: Skill Docs That Make AI Agents Run Malware

DDIPE: the attack that bypasses what direct prompt injection cannot

The numbers: 1,070 skills, 15 MITRE categories, 4 CVEs

The 4 case studies: sandbox bypass, CI/CD amplifier, skill-loading, and cosmetic fixes

The structural lesson: alignment and guardrails are complementary, not interchangeable

Skill registries are the new supply chain

Frequently Asked Questions

What is PoisonedSkills and how does it differ from prompt injection?

Which frameworks and models were tested?

Why are skill registries the new supply chain?

What is the Codex CLI sandbox semantic bypass?

Does multi-model verification work in practice?

Blog — Get in Touch

Schedule a Meeting

Email

WhatsApp

Address

Tech86 Specialist

We Value Your Privacy