AppTrana is Indusface’s AI-powered, fully managed platform integrating Web Application Firewall, DAST scanning, bot mitigation, and API security.

How does Indusface protect against cyberattacks?

Indusface protects web, mobile, and API applications with AI-powered WAAP technology, real-time threat intelligence, and managed security operations to prevent DDoS, bots, and zero-day threats.

Is there a free trial available?

Yes, a free trial for AppTrana is available on the Indusface website.

OWASP LLM10: Unbounded Consumption in AI Systems

Generative AI adoption is accelerating rapidly; over 75% of enterprise users now interact with GenAI tools, yet fewer than 40% of organizations have implemented controls to manage AI-related risks. As LLMs are exposed through APIs, copilots, and customer-facing workflows, attackers are increasingly targeting how these systems consume resources.

Large-scale adversarial testing policy violations, including unauthorized actions and misuse. These findings highlight a growing pattern: attackers no longer need to overwhelm infrastructure to cause impact; they can exploit how models process requests.

OWASP Top 10 for LLM Applications 2025 identifies this risk as LLM10: Unbounded Consumption. The risk occurs when LLM-powered applications allow uncontrolled use of compute resources, enabling a small number of requests to consume excessive tokens and GPU time. In metered environments, this can silently drain resources, slow down services, and impact availability without triggering traditional denial-of-service alerts.

Beyond Downtime: The Economic Attack Surface

Large Language Models are inherently resource intensive. Every prompt consumes tokens, and memory, and in cloud environments that consumption maps directly to cost. When usage controls are loose or absent, attackers can exploit how the model processes requests and turn routine interactions into a financial drain.

LLM10 highlights three ways this risk materializes.

Denial of Wallet attacks target pay-per-token pricing models. Rather than aiming for downtime, attackers focus on cost escalation. By submitting prompts designed to maximize inference complexity or output length, they drive sustained spend until operating the service becomes financially impractical.

Resource exhaustion attacks concentrate on performance degradation. A single, carefully constructed prompt, using recursive instructions, oversized inputs, or reasoning-heavy tasks, can monopolize GPU resources. Even at low request volumes, this can increase latency or block access for legitimate users sharing the same infrastructure.

Model extraction is the most subtle and often the hardest to detect. Through persistent, high-volume querying, attackers can analyze responses, infer model behavior, and recreate functional equivalents of proprietary systems. The result is intellectual property loss that occurs quietly, without triggering traditional availability or security alerts.

Together, these patterns show why LLM security cannot focus solely on uptime. In AI-driven systems, cost, performance, and intellectual property are part of the same attack surface, and all three require active protection.

Common Unbounded Consumption Attack Patterns in LLM Applications

Research and incident analysis show that unbounded consumption follows a few repeatable patterns.

Context window saturation involves sending oversized or variable-length inputs designed to maximize memory usage and processing time. Even low request volumes can push models into inefficient execution paths.

Reasoning loop exploitation targets models optimized for multi-step reasoning. Carefully crafted prompts can keep the model engaged in extended internal evaluation, generating thousands of tokens from a single request and tying up compute far longer than expected.

Side-channel abuse relies on sustained querying to infer how a model behaves like its constraints, decision patterns, or architectural characteristics. Over time, this information can be used to support model extraction or bypass safeguards.

These techniques do not require large botnets or traffic floods. Precision, not scale, is what makes them effective.

Key Indicators of Unbounded Consumption in LLM Applications

One of the earliest indicators is a sharp increase in token consumption that does not align with user growth or feature adoption. When overall usage appears stable, but token volumes rise disproportionately; it often points to prompts or workflows consuming far more resources than intended.

Performance degradation is another common signal. Teams may notice higher response times, intermittent latency spikes, or unexplained timeout errors across AI-driven features. These issues often stem from shared compute resources being saturated by a small number of expensive inference requests.

Cloud cost anomalies provide a more concrete warning. Billing alerts triggering earlier than expected in the month, or costs rising faster than forecasted, can indicate sustained inference abuse rather than organic growth. In many cases, these alerts are the first visible symptom of unbounded consumption.

At the infrastructure level, persistently high GPU utilization under otherwise normal traffic conditions is a strong red flag. When GPUs remain near capacity without corresponding increases in request volume, it suggests that a subset of requests is monopolizing compute resources.

By the time finance teams flag unexpected spend, unbounded consumption has often been active for days or weeks. Identifying these signals early allows security and platform teams to intervene before resource exhaustion, service degradation, or runaway costs translate into business impact.

Defending Against Unbounded Consumption

Defending against unbounded consumption begins with a fundamental shift in how LLM systems are viewed: tokens, compute, and execution time must be treated as security boundaries, not just operational metrics. Without explicit controls, even legitimate-looking interactions can escalate into resource exhaustion or cost abuse.

A foundational control is strict input validation. Inputs should be constrained to reasonable sizes based on the actual business use case. If an application does not require long documents, oversized payloads should never reach the model. Limiting input size early prevents excessive token expansion and reduces downstream compute impact.

Rate limiting must also evolve beyond simple request counts. Traditional per-IP or per-user limits are insufficient for LLM workloads. Effective protection requires quotas tied to cumulative token usage, inference time, or overall resource consumption. When defined thresholds are crossed, throttling should occur automatically to prevent a small number of requests from monopolizing resources.

Equally important are timeouts and throttling for long-running inference. Requests that exceed expected execution windows should be terminated decisively. This prevents reasoning loops or complex prompts from tying up GPUs indefinitely and degrading performance for other users.

To contain the blast radius of abuse, organizations should apply resource isolation and sandboxing techniques. Restricting an LLM’s access to internal services, APIs, and network resources limits both insider misuse and side-channel attacks, while enforcing clear boundaries on what the application can access and consume.

Continuous logging, monitoring, and anomaly detection provide the visibility needed to detect unbounded consumption early. Monitoring token velocity, execution duration, and cost acceleration in real time allows teams to identify abnormal patterns before they translate into service degradation or unexpected cloud spend.

Systems should also be designed for graceful degradation. Under heavy load or sustained abuse, partial functionality is far preferable to total failure. Limiting queued actions, capping concurrent operations, and scaling predictably help maintain availability even when demand spikes or attacks occur.

Finally, strong governance and access controls are essential. Role-based access control, least-privilege principles, centralized model inventories, and automated MLOps pipelines ensure that only authorized models, configurations, and deployments reach production. Combined with adversarial robustness training and output controls such as glitch token filtering or watermarking, these measures reduce the risk of extraction, abuse, and uncontrolled scaling.

Together, these controls ensure that unbounded consumption is managed proactively by protecting availability, cost, and system integrity without relying on after-the-fact billing alerts or infrastructure failures.

Products

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

Pricing

State of Application Security

Partners

State of Application Security

Solutions

State of Application Security

State of Application Security

State of Application Security

Resources

State of Application Security

State of Application Security

Company

State of Application Security

State of Application Security

Beyond Downtime: The Economic Attack Surface

Common Unbounded Consumption Attack Patterns in LLM Applications

Key Indicators of Unbounded Consumption in LLM Applications

Defending Against Unbounded Consumption

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

State of Application Security

LLM10: Unbounded Consumption – Understanding the OWASP Risk of Runaway AI Usage

Beyond Downtime: The Economic Attack Surface

Common Unbounded Consumption Attack Patterns in LLM Applications

Key Indicators of Unbounded Consumption in LLM Applications

Defending Against Unbounded Consumption

Frequently Asked Questions (FAQs)

Join 51000+ Security Leaders

Fully Managed SaaS-Based Web Application Security Solution

Get free access to Integrated Application Scanner, Web Application Firewall, DDoS & Bot Mitigation, and CDN for 14 days