A critical vulnerability in Ollama allows unauthenticated attackers to extract the entire process memory of exposed servers using just three API calls. Tracked as CVE-2026-7482 and nicknamed Bleeding Llama, the vulnerability puts roughly 300,000 internet-facing servers at risk.
Ollama is the most widely used open-source platform for running large language models locally, with over 170,000 GitHub stars and 100 million Docker Hub downloads. It has become the standard self-hosted AI inference engine across enterprises, research labs, and development teams. That scale of adoption is precisely what makes this vulnerability so serious.
With no authentication required and a public proof-of-concept already available, attackers can walk away with API keys, system prompts, user conversations, and cloud credentials. All extracted silently from Ollama’s heap memory, without triggering a single error or crash.
What Is CVE-2026-7482 (Bleeding Llama)?
Risk Analysis
| Field | Detail |
|---|---|
| Severity | Critical |
| CVSS v3.1 Score | 9.1 |
| Exploit Availability | Yes (Public PoC on GitHub) |
| Exploit Complexity | Low |
| Authentication Required | None |
| CVSS Vector | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:L |
Bleeding Llama is a heap out-of-bounds read vulnerability in Ollama’s GGUF model loader. An unauthenticated attacker can exploit it to read the Ollama process’s entire heap memory and silently exfiltrate that data to an external server. The attack requires just three unauthenticated API calls and leaves no error in the logs, making detection without dedicated endpoint monitoring practically impossible.
The patch is available in Ollama v0.17.1. If your instance was internet-accessible before patching, assume sensitive data has been exposed and rotate all secrets immediately.
Why Are 300,000 Ollama Servers Exposed to This Vulnerability?
Ollama’s default configuration binds to 127.0.0.1. But the widely used OLLAMA_HOST=0.0.0.0 setting opens it to all network interfaces, and the REST API has no built-in authentication. The /api/create and /api/push endpoints at the heart of this attack are completely open by default.
The disclosure timeline made things worse. The patch shipped in Ollama v0.17.1 on February 25, 2026, but the release notes did not flag it as a security fix. MITRE did not respond to the CVE request for two months. Without a CVE number, the vulnerability was invisible to scanners and patch management tools. Operators had no signal to prioritize the update.
In Kubernetes and container environments, vulnerable images spread quickly across auto-scaled workloads if outdated base images remain in deployment pipelines, multiplying the exposure further.
Many operators were unaware their instances were reachable at all. If you are running Ollama in your environment, the first step is knowing whether it is exposed. This is exactly the visibility gap AppTrana’s addresses, continuously monitoring your external attack surface and flagging exposed Ollama instances before they become an entry point.
How Does the Bleeding Llama Attack Work?
The root cause is a missing bounds check in Ollama’s GGUF tensor parsing code. When a GGUF file is submitted to the /api/create endpoint, Ollama reads the declared tensor offset and size without verifying they match the file’s actual length. An attacker crafts a GGUF file with a declared tensor shape far larger than the data actually present. During quantization in fs/ggml/gguf.go and server/quantization.go, the server reads past the allocated heap buffer. Whatever lives in memory beyond that boundary gets captured.
The attack then uses a lossless float-16 to float-32 conversion to preserve the stolen bytes in readable form rather than corrupting them through lossy quantization. The leaked memory is folded into the resulting model artifact and exfiltrated via Ollama’s built-in push functionality.
The three-step attack chain:
Step 1: Deliver the payload. Upload a malicious GGUF file with an inflated tensor shape to the /api/blobs endpoint via HTTP POST. No authentication required.
Step 2: Trigger the memory leak. Call /api/create to trigger model creation. Ollama hits the out-of-bounds read during quantization and folds leaked heap memory into the model artifact.
Step 3: Exfiltrate the data. Call /api/push pointing to an attacker-controlled registry. Ollama uploads the artifact, now containing stolen heap data, to the attacker’s server.
What Data Does CVE-2026-7482 Expose?
Heap memory retains data across requests, exposing everything the server has processed since its last restart. Confirmed data types at risk include:
- User prompts submitted to the Ollama API by any user or application
- System prompts from other models loaded on the same server
- Environment variables, which frequently contain API keys, database credentials, cloud service secrets, and authentication tokens
- Fragments of concurrent user conversations
- Outputs from connected tools and coding assistants routing through Ollama
The risk compounds in enterprise environments where Ollama is connected to coding assistants, internal tools, or AI agent workflows. Everything those integrations pass through Ollama’s heap is at risk.
Disclosure Timeline: How a Patched Vulnerability Stayed Hidden for Months
- February 2, 2026: Cyera researcher Dor Attias reported the vulnerability to Ollama
- February 25, 2026: Ollama acknowledged the issue and shared a fix, asking the researcher to submit the CVE independently
- February 26, 2026: Researcher warned Ollama that releasing a fix without flagging it as a security update leaves operators exposed
- March 2, 2026: CVE request submitted to MITRE. No response received
- March 26, 2026: Follow-up sent to MITRE. Still no response
- April 26, 2026: Researcher escalated to Echo, a third-party CVE Numbering Authority
- April 28, 2026: Echo assigned CVE-2026-7482
- May 1, 2026: Echo published the CVE record, making it visible to security tools
- May 2026: Public disclosure and broad media coverage
The window between patch availability and public awareness was nearly three months. Operators running any version before v0.17.1 had no CVE, no scanner alert, and no release note flagging the urgency.
Two Additional Unpatched Flaws in Ollama for Windows
Researchers disclosed two vulnerabilities in Ollama’s Windows update mechanism that chain into persistent code execution. Both remain unpatched as of May 2026.
CVE-2026-42248 (CVSS 7.7): Missing Signature Verification. The Windows client does not verify the update binary before installation. An attacker controlling the update server can supply an arbitrary executable that runs at the next application start.
CVE-2026-42249 (CVSS 7.7): Path Traversal in Windows Updater. The updater creates the staging directory path directly from HTTP response headers without sanitization. Combined with the missing signature check, this allows an attacker to write an executable to the Windows Startup folder, achieving persistent code execution at every user login.
Affected versions: Ollama for Windows 0.12.10 through 0.17.5.
How to Fix CVE-2026-7482: Remediation and Hardening Steps
If your Ollama instance was internet-accessible at any point before patching, treat it as compromised. Start with these steps:
Immediate actions:
- Upgrade to Ollama v0.17.1 or later. Verify with: ollama –version
- Audit every Ollama instance in your environment. Isolate any instance binding to 0.0.0.0 and reachable from outside a trusted network
- Rotate all secrets immediately if your instance was internet-accessible before patching. Treat it as compromised
- Review logs for suspicious /api/create or /api/push calls, especially those referencing external registry URLs
Hardening steps:
- Bind Ollama to 127.0.0.1. Remove OLLAMA_HOST=0.0.0.0 from production unless strictly required
- Deploy an authentication proxy or API gateway in front of all Ollama instances. The REST API has no built-in authentication
- Implement network segmentation for AI and ML workloads
- Avoid passing sensitive environment variables to the Ollama process
- For Windows: disable automatic updates and remove the Ollama Startup folder shortcut until CVE-2026-42248 and CVE-2026-42249 are patched
- Monitor /api/create and /api/push activity for anomalous model names or external registry references
How AppTrana Reduces Your Ollama Attack Surface
AppTrana’s AI Server Scan actively identifies exposed Ollama instances in your environment. If your Ollama deployment is internet-accessible without authentication or network controls, the scan surfaces it as an exposed asset, giving your team visibility to isolate and remediate before attackers find it.
AppTrana also detects and blocks suspicious /api/create and /api/push requests carrying malformed or oversized GGUF file uploads before they reach the Ollama server. This disrupts the initial delivery step, stopping the attack before the memory leak is triggered. Suspicious /api/push calls referencing external or unknown registries are flagged as potential exfiltration attempts.
Stay tuned for more relevant and interesting security articles. Follow Indusface on Facebook, Twitter, and LinkedIn.