Xiaomi — the same company that makes your Redmi phone and your air purifier — released an open-source coding agent that challenges Claude Code. MiMo Code. MIT license. 10.8K stars in 16 days. Fork of OpenCode. We analyzed the numbers, the caveats, and what this actually means for AI engineering.
What is MiMo Code
According to Xiaomi, MiMo Code is a terminal-native coding agent with full tool use: files, bash, Git, LSP, and MCP. It has a subagent system with parallel execution, persistent memory via SQLite FTS5 with checkpoint and self-maintenance (dream/distill), Max Mode with parallel best-of-N (N=5), and Compose Mode for specs-driven development with built-in skills (planning, TDD, code review, debugging). Voice input via TenVAD + MiMo ASR.
The underlying model is MiMo-V2.5-Pro: 1.02T total parameters, 42B active per inference, MoE with hybrid attention (Sliding Window + Global), 1M token context, 3 MTP layers for speculative decoding (~3x speedup). Pre-trained on 27 trillion tokens. Native FP8.
Bonus: MiMo-7B-RL (7.8B dense) matches o1-mini on math/code benchmarks. A 7B model matching a model 20x larger — according to Xiaomi (self-reported, no independent verification).
The benchmarks — and why you should read them with caution
According to Xiaomi (self-reported, no independent verification):
- SWE-bench Verified: 82% vs Claude Code's 79% (+3)
- SWE-bench Pro: 62% vs 55% (+7)
- Terminal Bench 2: 73% vs 69% (+4)
- Double-blind test with 576 devs: >65% win rate on tasks with 200+ steps. Below 200 steps: ~50/50.
The numbers are impressive. But they are all self-reported by Xiaomi with zero independent verification. We repeat this because it is fundamental: there is no third-party audit, no published reproducibility, no peer review. It is the equivalent of a company publishing its own NPS — it might be true, but you do not know if it is.
And the comparison is only vs Claude Code (Sonnet 4.6). Codex CLI + GPT-5.5 scores 82.2% on Terminal-Bench 2.0 — 9 points above MiMo Code. When the comparison field widens, the narrative changes.
Harness > model — the real innovation
The real innovation in MiMo Code is not the model. It is the harness — the orchestration architecture around the model. According to Xiaomi (self-reported, no independent verification), even using the same MiMo-V2.5-Pro model, the MiMo Code harness scores ~5 points above the Claude Code harness on SWE-bench Pro.
The difference lies in the memory architecture: checkpoint-writer saves context at regular intervals, context rebuild reconstructs context when the token limit approaches, and dream/distill compresses and consolidates memories between sessions. This gives an advantage on long tasks — exactly where the double-blind test showed >65% win rate.
It is the same lesson from SantanderAI: architecture matters more than raw model capability. The model is a commodity; the harness is the differentiator. When SantanderAI open-sourced its AI stack, the signal was that the infrastructure layer is becoming commoditized. MiMo Code reinforces this: if the same model scores differently depending on the harness, the value is not in the model — it is in the orchestration.
The caveats that matter
Four caveats that do not appear in the README:
Self-reported benchmarks: zero independent verification. Xiaomi may have selected favorable tasks, calibrated hyperparameters specifically for benchmarks, or simply reported the best runs. Without reproducibility, the numbers are indicative, not conclusive.
Limited comparison: only vs Claude Code. Codex CLI + GPT-5.5 scores 82.2% on Terminal-Bench 2.0 — 9 points higher. If the comparison were vs Codex, the narrative would be different.
MiMo Auto and Chinese law: MiMo Auto is free for a limited time and routes code through Xiaomi servers. Xiaomi is a Chinese company subject to Chinese law — this includes obligations to cooperate with government authorities. For proprietary code, customer data, and trade secrets, this is a real risk. Local deployment is the alternative, but requires ~8x H200 GPUs (~600GB+ VRAM in FP8).
V0.1.0: this is the first public release. V0.1.0 software has bugs, unstable APIs, and incomplete documentation by definition. It is not production-ready without extensive validation.
The pattern: SantanderAI → Xiaomi → who is next?
The pattern is clear: open-source from unexpected sources is redefining AI infrastructure. First Santander — a European bank open-sourcing its AI governance stack. Now Xiaomi — a Chinese phone manufacturer releasing a coding agent that challenges the market leader.
What do these two have in common? They are not AI companies. They are companies that depend on AI and decided that the infrastructure layer is not a competitive differentiator. Santander opened guardrails and bridges because every bank needs them. Xiaomi opened a coding agent because the model is a commodity — the harness is what matters.
At Tech86, we see this pattern accelerating. Companies in non-AI sectors will increasingly open-source AI infrastructure. The next one could be an automaker, a retailer, or a logistics company. When infrastructure is open-source, differentiation migrates to the application — and that is where we help companies compete: adopting the open-source layer and differentiating on integration, calibration, and operations.
