AI inference infrastructure is critical infrastructure. When 400,000 GPUs run software with three unpatched RCEs and the maintainer ignores CERT/CC, the responsibility falls on the operator. At Tech86, we have seen this pattern repeatedly: the vendor does not patch, the attacker does not wait, and the bill arrives at the workload owner.
Four RCEs, three unpatched
SGLang is the inference server behind workloads at xAI, NVIDIA, AMD, LinkedIn, Cursor, Oracle, Google Cloud, Azure, and AWS. In January 2026, the project spun out as RadixArk with Accel backing and a valuation of roughly 400 million dollars. The software running on those 400,000 GPUs has four remote code execution vulnerabilities. Only one has been fixed.
CVE-2026-5760 (CVSS 9.8) exploits GGUF files with Jinja2 payloads in chat_template. The victim loads the model, makes a request to /v1/rerank, and the payload executes. No authentication required. This is the only one patched — fixed in version 0.5.11.
CVE-2026-7301 (CVSS 9.8) targets the ZeroMQ scheduler. The multimodal runtime binds a ROUTER socket. The default code uses 127.0.0.1, but the official documentation recommends --host 0.0.0.0 in every example. The official Docker Compose uses network_mode: host. In every deployment that followed the guide, the socket is exposed. The scheduler calls pickle.loads() on each message. A crafted pickle equals RCE. The same pattern was fixed in March under CVE-2026-3059, but in a different code path. This one is different. No patch.
CVE-2026-7302 (CVSS 9.1) enables arbitrary file writes. The /v1/images/edits and /v1/videos endpoints accept uploads without sanitizing ../ in the filename. An attacker can write to any path accessible to the process. No patch.
CVE-2026-7304 (CVSS 9.8) leverages the custom_logit_processor field, which accepts JSON with a callable containing a hex-encoded dill payload. The server deserializes it with dill.loads() without validation. Dill is a superset of pickle — same arbitrary execution property. Requires --enable-custom-logit-processor to be active. No patch.
The timeline that should worry everyone
Antiproof reported the vulnerabilities to the maintainer on March 10. On March 25, a partial PR was opened — it only covers CVE-2026-5760. The other three were left out. On May 15, CERT/CC published VU#777338 because the maintainer did not respond. On May 26, JPCERT/CC issued an independent advisory.
Over two months since disclosure. Three CVEs with CVSS 9.8. Zero patches. A company valued at 400 million dollars that does not respond to CERT/CC. This is not an oversight — it is a pattern.
At Tech86, we have learned that the time between disclosure and active exploitation has shortened dramatically. When CERT/CC publishes, the adversary already knows. Every day without mitigation is a day with an open window.
Deserialization is the hole that never closes
The pattern is recurring and well-documented: pickle.loads() on untrusted data is RCE by definition. It is not a zero-day vulnerability — it is a property of the format. Python documents this explicitly. The March fix (CVE-2026-3059) should have triggered audits across all code paths with deserialization. It did not.
SGLang uses pickle.loads() in the ZeroMQ scheduler and dill.loads() in the custom logit processor. Two distinct code paths, same vulnerability class, same root cause: deserialization of untrusted data without validation. An inference server without authentication is one crafted pickle away from host compromise.
And the inference host is not just any host. It has access to proprietary models, training data, cloud credentials, and GPU compute. Compromising an inference server is not a security incident — it is a critical infrastructure incident.
The documentation that exposes everyone
The problem is not just the code. It is the guide. The official SGLang documentation recommends --host 0.0.0.0 in all deployment examples. The official Docker Compose configures network_mode: host. This means every team that followed the guide — and most do — exposed the ZeroMQ scheduler socket on the network.
The default code is 127.0.0.1. Secure. But the guide says to change it. And when the official guide says to open up, teams open up. It is not negligence — it is trust in the project documentation. Trusting an open source project's documentation is not imprudence. It is the normal operational flow. The problem is when the documentation teaches insecure configuration and the project does not fix the consequences.
Mitigations you need to apply today
The patch is not coming. The mitigations are workarounds, not root-cause fixes. But they are what exists. For CVE-2026-7301, restrict --host to 127.0.0.1 and add firewall rules on ZeroMQ ports. For CVE-2026-7302, block /v1/images/edits and /v1/videos at the reverse proxy. For CVE-2026-7304, disable --enable-custom-logit-processor. If you do not need multimodal, disable it. If you do not use rerank, block the endpoint.
Reduce the surface. Every active endpoint is a vector. Every enabled flag is a door. The principle is the same we apply to any critical infrastructure: if it is not necessary, it does not stay exposed.
The responsibility is yours
AI inference infrastructure is critical infrastructure. It is not a test server, not a prototype, not a side project. It is the software that processes sensitive data on hardware that costs hundreds of thousands of dollars per cluster. And when the maintainer does not respond to CERT/CC for two months, the responsibility to protect does not disappear — it transfers.
At Tech86, we operate cloud infrastructure on the premise that security does not depend on the upstream vendor. If the patch does not exist, the mitigation must exist. If the documentation teaches wrong, the configuration must correct it. That is why our Cloud Servers are provisioned with network isolation by default, firewall configured before the first deploy, and attack surface reduced to the functional minimum. When the vendor fails, the infrastructure must hold.
