Anthropic put an AI agent with access to secrets in your CI/CD pipeline. Microsoft showed how to steal everything in two steps. This isn't hypothetical — it's a documented structural vulnerability with a working proof-of-concept and 50 bypasses in the permission system. At Tech86, we've seen prompt injection incidents in AI agents grow exponentially this year. The Claude Code GitHub Action case is the most revealing because the problem isn't a bug — it's the design.
Two paths, one without a sandbox
Claude Code has two execution paths. The Bash tool runs in a Bubblewrap sandbox with environment scrubbing — the CLAUDE_CODE_SUBPROCESS_ENV_SCRUB variable strips credentials from the environment before execution. The Read tool, which reads files, had no sandbox. The call is in-process, direct, bypassing Bubblewrap entirely.
The result: the Read tool accessed /proc/self/environ and read ANTHROPIC_API_KEY in cleartext. The sandbox that protected Bash was irrelevant to Read. Anthropic secured the front door and left the window open.
But reading the key is only half the problem. Microsoft built a payload that defeated two defenses simultaneously.
Microsoft's two steps — defeating filter and scanner
The first obstacle: Claude's filter refuses to print keys starting with sk-ant-. The payload instructs the model: "cut the first 7 characters of the key." The LLM complies, emitting the key without the prefix. The output passes the filter because it doesn't contain the blocked pattern.
The second obstacle: GitHub's Secret Scanner redacts known patterns in stdout. But since the LLM modified the key before emission — removing the prefix — the scanner didn't detect it. The text reaching the log isn't a valid key. The attacker reconstructs the full key by prepending sk-ant- on the outside.
Exfiltration via WebFetch, Bash, GitHub MCP, or Action logs. Any agent output channel works. Microsoft proved that two layers of defense — model filter and platform scanner — are defeated when the attacker controls what the model emits.
RyotaK: 50 bypasses and the bot that became admin
On June 1, RyotaK from GMO Flatt Security published 50 bypasses in Claude Code's permission system. The most impactful: a fake bot with a name ending in [bot] bypassed the permission check. The Action auto-trusts any GitHub App — no identity validation.
With this bypass, RyotaK exfiltrated OIDC tokens and obtained an installation token with full repository write access. If the attack targeted Anthropic's own repository, it would poison the Action used by thousands of projects. Supply chain attack at scale, without touching a single npm package.
Anthropic's official example shipped with allowed_non_write_users: "*". Many copied it. Any non-write user had full access. In February, a similar attack on Cline stole an npm publish token and pushed unauthorized @openclaw/cli. 4,000 developers affected in 8 hours. Fixed in v1.0.94. CVSS 7.8. Bounty: 4,800 dollars.
Comment and Control: the same pattern across three agents
Johns Hopkins researchers documented the "Comment and Control" pattern in April: the same attack vector works in Claude Code, Gemini CLI Action, and Copilot Agent. All vulnerable.
In Copilot Agent, hidden HTML comments — invisible to humans in GitHub's interface — contained base64-encoded instructions that bypassed the secret scanner. Exfiltration via git push to an attacker-controlled repository. The agent reads the comment, decodes the instruction, executes it, and sends data out.
Claude Code: CVSS 9.4 Critical. Anthropic downgraded it to "None." Bounty: 100 dollars. For context: Google paid 1,337 dollars for the same type of vulnerability in Gemini. GitHub paid 500 for Copilot. Anthropic paid 100 and classified it as nonexistent. Downgrading CVSS 9.4 to "None" with a 100 dollar bounty sends the wrong signal to researchers and to the industry.
The Agents Rule of Two — the missing principle
Microsoft formulated the "Agents Rule of Two": an AI workflow should never have all three simultaneously — untrusted input processing, secret access via tools, and external communication capability. If two already exist, the third cannot be added.
This is fundamental because prompt injection isn't a model bug — it's context the agent was designed to process. PR titles and issue comments are legitimate SDLC data the agent needs to read. The attacker hijacks context within the boundaries of the intended workflow. There is no patch for "the agent reads what it's supposed to read."
We're entering an era where natural language is executable code, and untrusted inputs like GitHub issues should be treated as hostile by default. The Rule of Two is the Principle of Least Privilege applied to autonomous agents: if the agent needs to read issues, don't give it secret access. If it needs secrets, don't allow external communication. If it needs to communicate, don't process untrusted input.
At Tech86, we apply this principle across every AI integration in CI/CD. We audit workflows, isolate tools with credential access, and treat all SDLC input as untrusted. If your team runs AI agents in pipelines without this separation, Microsoft's attack isn't a hypothetical scenario — it's your next incident.
