Most teams building AI-powered products focus on what their LLM can do. Fewer ask what their LLM might give away. OWASP LLM02:2025 in OWASP Top 10 LLM covers exactly that risk: Sensitive Information Disclosure.

When an AI model leaks system prompts, training data, internal API keys, or confidential user information, the damage is immediate and hard to contain. The DeepSeek incident demonstrated this at scale. Understanding how these leaks happen and how to stop them is now a security baseline for any team deploying LLM-powered applications.

LLM02:2025 Sensitive Information Disclosure

OWASP LLM02:2025 describes the risk that an LLM application exposes sensitive information through its outputs. This includes data the model was trained on, data passed to it via system prompts or context windows, and data generated in response to user inputs that were designed to extract it.

Unlike traditional data leaks that target databases or file systems, LLM02 exploits the model itself as the delivery mechanism. The attacker does not need to break through a perimeter. They simply ask the right question in the right way.

LLM02 is distinct from a misconfiguration or a code vulnerability. The leak comes from the model’s own outputs, which makes it harder to detect with conventional security controls that inspect traffic at the edge rather than inspecting what the model says.

Sensitive information in this context covers a wide range of data types. System prompts that describe how the model behaves, business logic embedded in context windows, API keys or credentials passed as part of instructions, personally identifiable information from training data, and confidential internal documentation all qualify.

The five most common forms of sensitive information disclosure in LLM applications are:

System prompt leakage: The model reveals its own instructions when prompted by a user who phrases a request to trigger self-description or reflection.
Training data extraction: The model regenerates verbatim content it was trained on, including names, contact details, or proprietary text.
Context window exposure: Documents, database results, or API responses injected into the prompt context are echoed back in model outputs.
Credential disclosure: API keys, tokens, or connection strings passed to the model as instructions are surfaced to end users through crafted queries.
Cross-user data bleed: In multi-tenant applications, one user’s session data or conversation history bleeds into another user’s context window.

Why LLM02:2025 Sensitive Information Disclosure Matters

Sensitive information disclosure is a critical risk in real-world LLM deployments, where dynamically generated outputs can amplify even small control gaps into significant security, compliance, and operational issues.

LLM outputs are difficult to fully control: Responses are generated by combining training data, runtime context, and user input. This makes sensitive information disclosure harder to predict and harder to prevent using traditional application security controls.
Sensitive data can originate from both users and systems: Personal identifiable information, financial data, health records, legal documents, credentials, and confidential business data may enter LLM workflows through normal usage, increasing the risk of later exposure through outputs.
Exposed data spreads beyond the initial interaction: Once sensitive information appears in an output, it may be stored, forwarded, logged, or reused by downstream systems and users, making containment and remediation significantly more complex.
Regulatory and compliance risks escalate quickly: Even limited disclosure of regulated or confidential data can trigger violations of data protection laws, contractual obligations, and internal governance policies.
Disclosure weakens security posture: Leaked information can expose internal system behavior, proprietary algorithms, or credentials, enabling attackers to conduct follow-on attacks or bypass existing controls.
Most disclosures occur silently: Sensitive information often appears in valid-looking responses without generating alerts or errors, allowing exposure to persist unnoticed in production environments.

The DeepSeek Incident: What Happened and Why It Matters

In January 2025, Wiz Research discovered a publicly accessible ClickHouse database belonging to DeepSeek while assessing its external security posture. The database was found within minutes using standard reconnaissance techniques.

The exposure included over a million lines of log streams containing chat history, secret keys, backend details, and other highly sensitive information. The database was completely open and unauthenticated, with no defense mechanism to the outside world.

The breach exposed more than user conversations. System prompt templates describing how DeepSeek’s AI was configured, internal API keys that could access backend services, and log data revealing infrastructure topology were all accessible. This is a textbook LLM02 scenario at operational scale.

The incident illustrates a pattern seen across early GenAI deployments. Teams move quickly to ship AI-powered features. Security controls for the underlying infrastructure, the prompt pipeline, and the model’s outputs are treated as a follow-up task rather than a prerequisite. By the time the gap is visible, data is already exposed.

How Sensitive Information Disclosure Occurs in LLM-Based Systems

Most LLM applications follow a similar architecture. A user sends a message, the application assembles a prompt that includes system instructions, retrieved context, and conversation history, and the model generates a response. Sensitive data enters this pipeline at multiple points and exits through the model’s outputs if it is not filtered.

LLMs are trained to be helpful and to use all available context when generating a response. If a system prompt contains an API key and a user asks “what do you have access to?”, a model without guardrails may answer accurately and completely. The model is doing exactly what it was designed to do. The security gap is in the application design, not the model itself.

These disclosure paths typically fall into five categories:

Training Data Memorization: When sensitive records are present in training data, the model may retain and later reproduce fragments of that information. This recall can be triggered by specific prompts and is difficult to predict, since it depends on how patterns were learned during training rather than on direct data access.
Unsafe Use of Runtime Context: LLM applications often pass live context from databases, documents, or APIs into prompts. If this data is not properly filtered, the model can include confidential or regulated information in its response. Because this context is treated as input, disclosure may occur without clear system errors.
Prompt Manipulation: System prompts and safety instructions guide model behavior. They do not enforce hard limits. Carefully crafted inputs can weaken these controls, allowing the model to return information it was meant to withhold, even in otherwise well-configured systems.
Exposure of Proprietary Details: Poorly constrained outputs can reveal internal model logic, training artifacts, or proprietary algorithms. Over time, this information can be used to infer model behavior or extract sensitive intellectual property.
Configuration Weaknesses: Misconfigurations such as exposed system prompts, verbose error messages, or broad internal access frequently enable disclosure. These vulnerabilities often persist in production, leading to repeated exposure without deliberate exploitation.

Addressing these disclosure paths requires controls at each layer: what enters the model context, what the model is allowed to output, and how the surrounding infrastructure is configured and monitored.

Exposed LLM infrastructure is one of the most common configuration weaknesses, publicly accessible inference endpoints allow attackers to query your model directly without bypassing any controls. Read: Exposed LLM Infrastructure: Risks & Exploits

How to Prevent or Mitigate LLM02:2025 Sensitive Information Disclosure

Preventing sensitive information disclosure in LLM applications requires consistent controls across data, access, configuration, and monitoring, applied before data reaches the model and maintained throughout its operation.

Data Sanitization and Input Validation: Sensitive information should be removed, masked, tokenized, or redacted before it is used for training or passed into the model during inference. Strong input validation must also be applied to detect and block sensitive or harmful content before it reaches the model, reducing the risk of confidential data being learned or disclosed through prompts or contextual inputs.
Access Control and Data Source Restriction: Model access to data should follow strict least-privilege principles, limiting interaction to only what is necessary for the intended function. External APIs, document repositories, and runtime data sources must be tightly controlled to prevent unintended leakage through loosely managed or overly broad integrations.
Privacy-Preserving Learning Techniques: Federated learning can reduce centralized data exposure by keeping training data distributed across locations, lowering the impact of large-scale leaks. Differential privacy further limits disclosure by adding controlled noise to data or outputs, making it difficult to reconstruct individual records from model responses.
Secure System Configuration: System prompts, preambles, and internal instructions must be protected from user access or override. Configuration hardening is equally important, including suppressing verbose error messages and avoiding exposed settings by following established security misconfiguration best practices such as OWASP API security guidance.
User Education and Transparency: Users should be clearly guided on safe interaction practices, including avoiding the submission of sensitive information. At the same time, organizations must maintain transparency around data usage, retention, and deletion, and provide opt-out mechanisms for including user data in training processes.
Output Monitoring and Detection: Model outputs should be continuously monitored for unexpected sensitive data, abnormal response patterns, or signs of prompt-based extraction. Early detection and response help contain isolated disclosures before they escalate into broader security or compliance incidents.

Securing with AppTrana AI Shield

AppTrana AI Shield helps reduce the risk of sensitive information disclosure by adding enforcement and visibility at the interaction layer where LLM risks surface. The fully managed AI firewall continuously inspects inputs, contextual data, and generated responses for exposure patterns, applies policy-based controls to limit unsafe behavior, and detects prompt-driven attempts to extract protected information. By combining real-time monitoring with configuration hardening and managed oversight, AppTrana AI Shield helps ensure that sensitive data remains contained even as LLM-powered features are deployed at scale.

Explore how AI-Shield detects and helps mitigate LLM data exposure risks. Request a Demo.

Frequently Asked Questions

No. Sensitive information disclosure can involve both user-provided data and system-level data. This includes internal records, retrieved documents, logs, runtime context, and proprietary model assets such as training data or internal logic that are processed by the LLM during response generation.

Sensitive data exposed through LLMs can include personally identifiable information (PII), financial and health records, legal documents, security credentials, confidential business information, and proprietary model assets such as training data or internal logic. Exposure can affect both the model and the applications embedding it.

Sensitive information leakage typically occurs due to training data memorization, unsafe use of runtime context, prompt manipulation, weak output controls, or configuration vulnerabilities such as exposed system prompts and verbose error messages. These vulnerabilities often remain unnoticed until sensitive data appears in model responses.

LLM outputs are dynamically generated and often trusted by downstream systems. Once sensitive information appears in an output, it can be stored, reused, or acted upon immediately, making containment difficult. Even a single disclosure can lead to privacy violations, regulatory non-compliance, intellectual property loss, or broader security compromise.

A traditional data breach involves an attacker gaining unauthorized access to a storage system. LLM02 uses the model itself as the disclosure mechanism. The attacker does not breach a database. They interact with the AI application through its intended interface and craft prompts that cause the model to reveal sensitive information in its responses. No perimeter is crossed. The model does the work.

Yes, and this is a common attack pattern. Prompt injection (LLM01) is frequently the technique used to trigger LLM02. An attacker injects malicious instructions that override the system prompt’s confidentiality directives, and the model then discloses the sensitive information those directives were meant to protect. Addressing LLM02 without also addressing prompt injection leaves a significant gap.

Automated bots can send thousands of prompt variations in minutes, systematically testing what a model will and will not disclose. A human attacker manually probing an AI endpoint might find one extraction technique in an hour. A bot can test hundreds of techniques in the same window and do so at a scale that overwhelms any manual review. Rate limiting, behavioral bot detection, and anomaly monitoring are essential complements to output filtering when addressing LLM02 in production environments.

LLM02:2025 – Sensitive Information Disclosure and How to Prevent It