Pular para o conteúdo principal
Close
Security

Claude Code: 6 Disclosures, 1 Real Attack, and the Architectural Disease

Gabriel Ferraresi· CEO | Tech86June 17, 20265 min
claude-codesecuritysupply-chainci-cdprompt-injection

A structural vulnerability in the Claude Code GitHub Action produced 6 separate disclosures, 1 source code leak, and 1 real supply chain attack — and the problem remains unsolved. We are not talking about an isolated bug. We are talking about a vulnerability class that each specific patch leaves intact.

The sandbox bypass: ANTHROPIC_API_KEY in plaintext

Microsoft published on June 5: the Read tool in Claude Code made direct in-process calls, bypassing the Bubblewrap sandbox that protected the Bash tool. The result was direct — any prompt injection in an issue or PR could read /proc/self/environ and exfiltrate the ANTHROPIC_API_KEY in plaintext.

Microsoft built a payload that defeated TWO defenses simultaneously. Claude's filter refuses to print keys starting with sk-ant- — the payload instructed "cut the first 7 characters", laundering the output. GitHub Secret Scanner did not detect it because the LLM modified the key before stdout. The attacker reconstructs by prepending sk-ant-. Per Microsoft, the bypass was trivial.

The detail that concerns us is not just the exfiltration itself. It is that two defense layers specifically designed to prevent this scenario were defeated by a natural language payload. Claude's content filter and GitHub Secret Scanner — both active, both functioning as designed — failed because the LLM acted as an intermediary that transformed the key before any detector could recognize it. It is a new kind of threat: the AI agent itself is the evasion mechanism.

But that is only the tip. The cascade of disclosures reveals the problem is systemic.

The cascade: 6 disclosures, 5 vectors

1. RyotaK (GMO Flatt Security, June): 50 ways to bypass the permission system. A fake bot with a name ending in [bot] received auto-trust from the action. It stole OIDC tokens and obtained an installation token with full write access. Patch: v1.0.94. Bounty: $4,800 per the bug bounty program.

2. Aonan Guan (JHU, "Comment and Control", April): same pattern in Claude Code, Gemini CLI Action, and Copilot Agent. Claude Code: CVSS 9.4 assigned by Anthropic (initially 9.3, then 9.4). Anthropic downgraded to "None". Bounty: $100. Google: $1,337. GitHub: $500.

3. Clinejection (February): a REAL supply chain attack. Cline used claude-code-action with allowed_non_write_users: "*". Per Adnan Khan (Clinejection, February), an attacker exploited prompt injection in the triage workflow, poisoned the GitHub Actions cache ("Cacheract" technique) and stole the npm publish token, publishing an unauthorized [email protected] with a postinstall script installing openclaw. The permissive configuration allowed_non_write_users: "*" was the pattern that facilitated the attack. Per StepSecurity, 4,000 downloads in 8 hours.

4. Source code leak (March): Anthropic leaked 512K lines of Claude Code via source map on npm. Per Trend Micro and Zscaler, attackers weaponized the leak within 24 hours with fake repositories distributing Vidar stealer and GhostSocks proxy malware, disguised as "Claude Code with enterprise features unlocked". The leak was not a security disclosure — it was an operational error that exposed the tool's internal logic and gave attackers the complete blueprint to build more sophisticated exploits.

5. HackerBot-Claw (February): first documented case of AI attacking AI in CI/CD. An autonomous agent (Claude Opus 4.5, per the bot's self-description) attempted to prompt-inject another agent via a poisoned CLAUDE.md. Claude detected the attempt. Of the 7 targets, 5 were compromised via traditional CI/CD vectors (shell injection, cache poisoning, script injection). The only AI-vs-AI attack was detected and blocked. But the 71% compromise rate shows that conventional defenses failed systematically — and AI-based detection, while effective in the single case tested, is no guarantee against non-AI vectors.

6. CVE-2026-47751 (June): per the GitHub Advisory Database (reported via HackerOne by reptou), an attacker opens a PR with a malicious .mcp.json, Claude Code loads it, and achieves RCE. A third distinct compromise vector. MCP (Model Context Protocol) was designed to extend agent capabilities — but without origin validation, any configuration file becomes a vector for arbitrary execution.

Supply chain: the attack that was not theoretical

The Clinejection deserves special attention because it was a real attack with measurable impact. Cline, an AI extension for VS Code, used claude-code-action with allowed_non_write_users: "*". Per Adnan Khan (Clinejection, February), an attacker exploited prompt injection in the triage workflow, poisoned the GitHub Actions cache ("Cacheract" technique) and stole the npm publish token, publishing an unauthorized [email protected] with a postinstall script that installed openclaw. The permissive configuration allowed_non_write_users: "*" was the pattern that facilitated the attack. Per StepSecurity, 4,000 downloads occurred in 8 hours before removal.

The pattern is familiar to anyone tracking supply chain attacks in package ecosystems: compromise a publishing account, inject code into a legitimate version, distribute before detection. The difference is that the entry vector was a permissive configuration in an AI GitHub Action — not credential stuffing or phishing. The attacker did not need to break npm authentication. The AI action handed over the token.

At Tech86, we have seen this pattern repeat in infrastructure audits: permissive configurations in automation tools that function as side doors. The allowed_non_write_users: "*" is the functional equivalent of chmod 777 — convenient for testing, catastrophic in production.

The patch covers the symptom, not the disease

Microsoft's patch (v2.1.128, May) blocked /proc/ in the Read tool. Per Microsoft, it is a specific fix. Not structural. The vulnerability class remains open.

Each disclosure revealed a different vector — sandbox bypass, permission system, supply chain, source map leak, AI-vs-AI, MCP RCE — but all share the same root: an AI agent with untrusted input, access to secrets, and external communication, simultaneously. The patch blocks one path. The attacker finds another.

The bounty discrepancy is revealing. Anthropic paid $100 for a CVSS 9.4. Google paid $1,337 for the same pattern. GitHub paid $500. When bounties do not reflect severity, researchers lose incentive to report responsibly.

The Agents Rule of Two: principle without enforcement

Meta formulated the rule: an AI workflow must never simultaneously have all three — (1) untrusted input, (2) access to secrets, (3) external communication. Microsoft endorsed the principle. But endorsement is not enforcement.

Per Microsoft, "we are entering an era where natural language is executable code, and untrusted inputs like GitHub issues must be treated as hostile by default." The statement is correct. But without automatic validation in the pipeline that prevents the coexistence of all three factors, the rule depends on manual discipline. And manual discipline fails at scale.

At Tech86, we apply this principle in practice: our managed EDR monitors AI agent behavior in CI/CD — access to secrets, unexpected external communication, command execution outside the baseline. When architectural isolation fails, behavioral detection is the last line of defense. Six disclosures in months show that relying on the sandbox alone is not enough.

Interested in this solution?

Explore our managed services and infrastructure.

Explore Managed EDR

Frequently Asked Questions

The Read tool in Claude Code made direct in-process calls, bypassing the Bubblewrap sandbox that protected the Bash tool. Per Microsoft, any prompt injection in an issue or PR could read /proc/self/environ and exfiltrate the ANTHROPIC_API_KEY in plaintext. The problem is architectural: the sandbox protected one tool but not the other.

In February, Cline used claude-code-action with allowed_non_write_users: "*". Per Adnan Khan (Clinejection, February), an attacker exploited prompt injection in the triage workflow, poisoned the GitHub Actions cache ("Cacheract" technique) and stole the npm publish token, publishing an unauthorized [email protected] with a postinstall script installing openclaw. The permissive configuration allowed_non_write_users: "*" was the pattern that facilitated the attack. Per StepSecurity, there were 4,000 downloads in 8 hours. This was a REAL supply chain attack — not theoretical.

The v2.1.128 patch (May, per Microsoft) blocked /proc/ in the Read tool. It is a specific fix, not a structural one. The vulnerability class remains open: 6 disclosures, 5 distinct vectors, and 1 real attack demonstrate that the problem is architectural, not incidental.

Per Meta, an AI workflow must never simultaneously have: (1) untrusted input, (2) access to secrets, and (3) external communication. Microsoft endorsed the principle. But endorsement is not enforcement — without automatic validation in the pipeline, the rule depends on manual discipline.

Blog — Get in Touch

Have a question about our articles or services? Our team is ready to help.

Schedule a Meeting

Book a time slot.

Schedule Now

Email

Send us a message.

[email protected]

WhatsApp

Quick conversation.

Address

Avenida Paulista, 1636 - São Paulo - SP - 01310-200

Tech86 Specialist

Online now

Hello! How can we help scale your business today?

Tech86 Engineering

We Value Your Privacy

We use cookies and similar technologies to optimize your experience, analyze site traffic, and personalize content. By clicking "Accept All", you agree to the use of all cookies. Read our Privacy Policy.