If the model is only 1.6% of the code, why does it matter?

The model is the engine — without it, nothing moves. But the harness is the chassis, steering, and brakes. A powerful engine without a harness is an uncontrolled projectile. The model generates capability; the harness turns it into reliability.

What is a deny-first permission framework?

It means every action starts denied. Nothing executes without explicit approval. The system automatically evaluates what can run in auto mode and what needs human confirmation. Permissions are never inherited across sessions.

Does this architecture apply to any AI project?

The principles are universal — permissions, context engineering, isolation, persistence. But implementation varies. The paper shows that the same design questions produce different architectural answers when deployment context changes. There is no universal blueprint.

The Harness Beats the Model — Claude Code Architecture

Q: What is a harness in the context of AI agents?

A harness is the control infrastructure that wraps the model: permissions, context engineering, execution isolation, and persistence. It's what turns a language model into a reliable agent.

When Anthropic published the "Dive into Claude Code" paper, we expected to find a system dominated by AI logic. We found the opposite. Claude Code has 1,900 TypeScript files. Only 1.6% is AI logic. The other 98.4% is control infrastructure. At Tech86, this confirmed what we see in practice: the model's intelligence isn't the product. The harness is the product. If two teams use similar models, the team with the better harness delivers the more secure and reliable agent.

The core is a while-loop — and that's correct

The heart of Claude Code is a queryLoop implemented as an async generator. Build context, call the model, execute tool, repeat. Simple. Anyone expecting complexity in the central loop is looking in the wrong place.

All the complexity lives in the systems around that loop. The QueryEngine abstracts the loop for the interfaces — CLI, SDK, IDE. The loop knows nothing about permissions, compaction, or persistence. It just iterates. This separation is deliberate: the core stays predictable while complexity grows in the peripheral layers.

In our experience, teams that try to pile logic into the central loop create fragile systems. Every new capability adds coupling. Claude Code avoids this by design — the loop is the last place you want complexity.

Five layers that don't share failure modes

The architecture splits into five independent layers. Surface Layer: CLI, SDK, IDE — the interfaces users touch. Core Layer: the queryLoop as an async generator. Safety/Action Layer: deny-first permission framework, hooks, shell sandbox. State Layer: CLAUDE.md hierarchy, auto-memory, JSONL append-only transcripts. Backend Layer: real execution (Bash, PowerShell) and MCP integrations.

The independence between layers isn't accidental. If the safety layer fails, the state layer captures the violation in the transcript. If the backend executes something unexpected, the safety layer intercepts it on the next iteration. The system degrades gracefully because AI and OS don't share failure modes.

We've seen projects where the safety layer and the execution layer share state. When one fails, the other falls too. It's the most dangerous design pattern in agentic systems: coupling between who decides and who executes.

The context pipeline as a scarce resource

Context is the scarcest resource in an AI agent. Claude Code treats this with engineering rigor. Five sequential shapers process context before each model call: budget reduction, snip, microcompact, context collapse, and auto-compact. Each operates at a different granularity level.

The result: the model always receives the most relevant context within the available budget. It's not magic — it's a deterministic pipeline that prioritizes, summarizes, and compacts. The alternative, which we see frequently, is to blow past the token limit and hope the model figures it out. It doesn't work.

The transcript is append-only. No direct mutation. Immutable audit trail. Rewind and fork operations are reconstructed from the log, not restored from snapshots. This means any state can be reproduced at any time. In security audits, this property is worth more than any AI capability.

Deny-first permissions and the approval fatigue problem

The permission framework has 7 technical modes. It's deny-first: nothing executes without explicit approval. The yoloClassifier evaluates whether an action can run in auto mode. bashSecurity performs AST verification on shell commands before execution. Layer upon layer.

But here's the number that matters: users approve 93% of permission prompts per the paper. Approval fatigue is real. The system responds with intelligent automation, not more prompts. Automatically classifying what's safe to run without confirmation is just as important as blocking what isn't.

And there's more: when resuming a session, permissions are not restored. The agent never inherits past authorizations. This contradicts the intuition of "convenience," but it's the correct design. Sessions are trust boundaries. Inheriting permissions from a previous session means inheriting context that may have changed.

No universal blueprint — but the harness is universal

The paper compares Claude Code with OpenClaw and reveals something important: the same design questions produce different architectural answers when deployment context changes. Interactive CLI versus autonomous CI/CD execution generate fundamentally different architectures.

But one thing is universal: if two teams use similar models, the team with the better harness — permissions, context engineering, execution isolation, persistence — delivers the more secure and reliable agent. Agentic autonomy doesn't emerge from the model's freedom. It emerges from the harness's rigor.

At Tech86, we apply this logic to every AI architecture project we design. We don't start with the model. We start with the harness — because that's what determines whether the system works on a Friday night when nobody's watching.

What this means for your project

The lesson from Claude Code isn't "copy this architecture." It's "understand the principle." The principle is that the harness is the product. The model is commodity — everyone has access to the same models. What separates a reliable agent from a polished demo is the control infrastructure around it.

If you're building AI agents and investing 80% of your effort in model logic, you're investing in the wrong place. The model is already good. What most projects lack is the harness that turns capability into reliability. That's why at Tech86, our AI architecture consulting always starts with the same question: how does your system fail? Not how it works — how it fails. The answer to that question defines the harness you need.

The Harness Beats the Model — Claude Code Architecture

The core is a while-loop — and that's correct

Five layers that don't share failure modes

The context pipeline as a scarce resource

Deny-first permissions and the approval fatigue problem

No universal blueprint — but the harness is universal

What this means for your project

Frequently Asked Questions

What is a harness in the context of AI agents?

If the model is only 1.6% of the code, why does it matter?

What is a deny-first permission framework?

How does Claude Code compact context without losing critical information?

Does this architecture apply to any AI project?

Blog — Get in Touch

Schedule a Meeting

Email

WhatsApp

Address

Tech86 Specialist

We Value Your Privacy