Pular para o conteúdo principal
Close
Security

SGLang: 4 unpatched RCEs in the AI inference server

Gabriel Ferraresi· CEO | Tech86May 29, 20265 min
sglangrcesecurityai inferencevulnerability

AI inference infrastructure is critical infrastructure. When 400,000 GPUs run software with three unpatched RCEs and the maintainer ignores CERT/CC, the responsibility falls on the operator. At Tech86, we have seen this pattern repeatedly: the vendor does not patch, the attacker does not wait, and the bill arrives at the workload owner.

Four RCEs, three unpatched

SGLang is the inference server behind workloads at xAI, NVIDIA, AMD, LinkedIn, Cursor, Oracle, Google Cloud, Azure, and AWS. In January 2026, the project spun out as RadixArk with Accel backing and a valuation of roughly 400 million dollars. The software running on those 400,000 GPUs has four remote code execution vulnerabilities. Only one has been fixed.

CVE-2026-5760 (CVSS 9.8) exploits GGUF files with Jinja2 payloads in chat_template. The victim loads the model, makes a request to /v1/rerank, and the payload executes. No authentication required. This is the only one patched — fixed in version 0.5.11.

CVE-2026-7301 (CVSS 9.8) targets the ZeroMQ scheduler. The multimodal runtime binds a ROUTER socket. The default code uses 127.0.0.1, but the official documentation recommends --host 0.0.0.0 in every example. The official Docker Compose uses network_mode: host. In every deployment that followed the guide, the socket is exposed. The scheduler calls pickle.loads() on each message. A crafted pickle equals RCE. The same pattern was fixed in March under CVE-2026-3059, but in a different code path. This one is different. No patch.

CVE-2026-7302 (CVSS 9.1) enables arbitrary file writes. The /v1/images/edits and /v1/videos endpoints accept uploads without sanitizing ../ in the filename. An attacker can write to any path accessible to the process. No patch.

CVE-2026-7304 (CVSS 9.8) leverages the custom_logit_processor field, which accepts JSON with a callable containing a hex-encoded dill payload. The server deserializes it with dill.loads() without validation. Dill is a superset of pickle — same arbitrary execution property. Requires --enable-custom-logit-processor to be active. No patch.

The timeline that should worry everyone

Antiproof reported the vulnerabilities to the maintainer on March 10. On March 25, a partial PR was opened — it only covers CVE-2026-5760. The other three were left out. On May 15, CERT/CC published VU#777338 because the maintainer did not respond. On May 26, JPCERT/CC issued an independent advisory.

Over two months since disclosure. Three CVEs with CVSS 9.8. Zero patches. A company valued at 400 million dollars that does not respond to CERT/CC. This is not an oversight — it is a pattern.

At Tech86, we have learned that the time between disclosure and active exploitation has shortened dramatically. When CERT/CC publishes, the adversary already knows. Every day without mitigation is a day with an open window.

Deserialization is the hole that never closes

The pattern is recurring and well-documented: pickle.loads() on untrusted data is RCE by definition. It is not a zero-day vulnerability — it is a property of the format. Python documents this explicitly. The March fix (CVE-2026-3059) should have triggered audits across all code paths with deserialization. It did not.

SGLang uses pickle.loads() in the ZeroMQ scheduler and dill.loads() in the custom logit processor. Two distinct code paths, same vulnerability class, same root cause: deserialization of untrusted data without validation. An inference server without authentication is one crafted pickle away from host compromise.

And the inference host is not just any host. It has access to proprietary models, training data, cloud credentials, and GPU compute. Compromising an inference server is not a security incident — it is a critical infrastructure incident.

The documentation that exposes everyone

The problem is not just the code. It is the guide. The official SGLang documentation recommends --host 0.0.0.0 in all deployment examples. The official Docker Compose configures network_mode: host. This means every team that followed the guide — and most do — exposed the ZeroMQ scheduler socket on the network.

The default code is 127.0.0.1. Secure. But the guide says to change it. And when the official guide says to open up, teams open up. It is not negligence — it is trust in the project documentation. Trusting an open source project's documentation is not imprudence. It is the normal operational flow. The problem is when the documentation teaches insecure configuration and the project does not fix the consequences.

Mitigations you need to apply today

The patch is not coming. The mitigations are workarounds, not root-cause fixes. But they are what exists. For CVE-2026-7301, restrict --host to 127.0.0.1 and add firewall rules on ZeroMQ ports. For CVE-2026-7302, block /v1/images/edits and /v1/videos at the reverse proxy. For CVE-2026-7304, disable --enable-custom-logit-processor. If you do not need multimodal, disable it. If you do not use rerank, block the endpoint.

Reduce the surface. Every active endpoint is a vector. Every enabled flag is a door. The principle is the same we apply to any critical infrastructure: if it is not necessary, it does not stay exposed.

The responsibility is yours

AI inference infrastructure is critical infrastructure. It is not a test server, not a prototype, not a side project. It is the software that processes sensitive data on hardware that costs hundreds of thousands of dollars per cluster. And when the maintainer does not respond to CERT/CC for two months, the responsibility to protect does not disappear — it transfers.

At Tech86, we operate cloud infrastructure on the premise that security does not depend on the upstream vendor. If the patch does not exist, the mitigation must exist. If the documentation teaches wrong, the configuration must correct it. That is why our Cloud Servers are provisioned with network isolation by default, firewall configured before the first deploy, and attack surface reduced to the functional minimum. When the vendor fails, the infrastructure must hold.

Interested in this solution?

Explore our managed services and infrastructure.

Explore Cloud Servers

Frequently Asked Questions

SGLang is an AI inference server used by xAI, NVIDIA, AMD, LinkedIn, Cursor, Oracle, Google Cloud, Azure, and AWS. It runs on approximately 400,000 GPUs. The four vulnerabilities allow unauthenticated remote code execution — three of them have no patch available. Any deployment that followed the official documentation is exposed.

Yes. The official documentation recommends --host 0.0.0.0 in all examples, and the official Docker Compose uses network_mode: host. This exposes the ZeroMQ scheduler socket on the network. The same pattern was previously fixed in March under CVE-2026-3059, but in a different code path. If you followed the guide, you are exposed.

There is no indication they will. Antiproof reported the vulnerabilities on March 10. The partial PR on March 25 does not cover three of the four CVEs. CERT/CC published VU#777338 on May 15 after the maintainer failed to respond. JPCERT/CC issued an advisory on May 26. Over two months with no fix for three CVEs rated CVSS 9.8.

Pickle is Python serialization format. Deserializing untrusted pickle data allows arbitrary code execution — this is a documented property of the format. SGLang calls pickle.loads() on every ZeroMQ scheduler message and dill.loads() in the custom logit processor. Any attacker with access to the socket can send a crafted payload and achieve RCE.

Not necessarily, but you must mitigate now. Apply network restrictions, block unnecessary endpoints, and disable the custom logit processor. If your infrastructure cannot support these mitigations, evaluate alternatives like vLLM with patches applied. The patch is not coming — the responsibility to protect is yours.

Blog — Get in Touch

Have a question about our articles or services? Our team is ready to help.

Schedule Meeting

Book a time.

Schedule Now

Email

Send us a message.

[email protected]

WhatsApp

Quick chat.

Address

Avenida Paulista, 1636 - São Paulo - SP - 01310-200

Tech86 Specialist

Online now

Hello! How can we help scale your business today?

Tech86 Engineering

We value your privacy

We use cookies and similar technologies to optimize your experience, analyze site traffic, and personalize content. By clicking "Accept All", you agree to the use of all cookies. Read our Privacy Policy.