Listen to the latest episode of Guardians of the Enterprise for insights from cyber leaders - click here

Is This a Layer 7 DDoS or Something Else? A Diagnostic Guide for SREs and DevOps Engineers

A 2 AM alert fires. 5xx errors are spiking. Latency is through the roof. The instinct is to act immediately, but acting on the wrong diagnosis wastes the most critical minutes of the incident. At the application layer, a sophisticated Layer 7 DDoS attack looks identical to a bad deployment, a dependency failure, a CDN misconfiguration, or a DNS issue.

This guide gives Site Reliability Engineers (SRE) and DevOps engineers a structured path from “something is wrong” to “I know what this is” in under five minutes. Once you have a confident diagnosis, the response path is clear. If it points to Layer 7 DDoS, link to our first 30 minutes action plan for the response playbook.

The 2 AM Checklist: Is It a DDoS or an Internal Failure?

In the first five minutes of an availability drop, your only goal is to identify the nature of the beast. Wasted time in the “diagnosis” phase is the biggest contributor to high Mean Time to Isolate (MTTI). Use the table below to match your current dashboard symptoms to the most likely root cause.

Use this as a reference throughout the diagnostic process. Work through Sections 1 through 3 before drawing a conclusion from the Layer 7 DDoS row.

Failure Mode Primary Tell Where to Check First
Bad Deployment Error spike matches push window CI/CD Change management logs
CDN Misconfig High Origin TTFB vs. Low Edge Latency CDN Performance/Cache-Miss dashboard
Dependency Failure Selective 504s on specific API flows Upstream Service Mesh/API metrics
DNS Failure Traffic cliff + near-zero Server CPU DNS Resolver logs & Ingress traffic
WAF or Edge Outage Edge 5xx errors with zero origin traffic Provider status page (e.g. Cloudflare)
Layer 7 DDoS RPS spike + high variance in Source IPs Ingress WAF/Load Balancer logs

 

Once you have a baseline from your initial dashboard triage, you need a high-speed verification process. The table helps you narrow down the field but do validate the specific cause before shifting into mitigation mode. Your first priority is to confirm that the service interruption is external by quickly auditing your internal state.

Minimizing MTTI: Ruling Out Internal Deployments and Configuration Changes

The goal of your first two minutes is to reduce your Mean Time to Isolate, or MTTI. This metric tracks how quickly you can point to the specific cause of a failure. The goal is to determine if the “attack” is actually a self-inflicted wound.

Start with the Change Management Check. Open your deployment logs and feature flag dashboard. Did a push happen in the last fifteen minutes? Did someone toggle a global configuration or update a WAF rule? If the timing of the latency spike aligns perfectly with a “Success” message in your CI/CD pipeline, the code is your primary suspect.

Next, look at the Blast Radius. A true Layer 7 DDoS usually hits your ingress or a public-facing gateway, causing a broad failure across multiple nodes or regions. If the errors are localized to a specific microservice, a single pod, or a specific database shard, you are likely looking at an internal bottleneck or a logic error. Malicious traffic doesn’t usually pick and choose which pod to crash; it overwhelms the entire entry point.

Finally, use the Rollback Signal. If you have a recent deployment, trigger a rollback immediately. In a healthy environment, the error rate would begin to plateau or dip within two to three minutes as the old, stable code takes over. If you roll back and the 5xx errors continue to climb at the same trajectory, stop looking at your code. The call is coming from outside the house. Move on to analyzing the ingress traffic.

Differentiating Network and Dependency Outages from Layer 7 Attacks

If your internal environment is stable, check the infrastructure between your users and your servers. Three external failure modes frequently mask themselves as Layer 7 attacks. Identifying them early prevents you from wasting time on a mitigation strategy that will not work.

A CDN misconfiguration is the first suspect. A CDN, or Content Delivery Network, is the distributed system that caches your content near your users to reduce latency. When a CDN is configured incorrectly, your users experience high response times that look like a targeted attack. The concrete metric to check here is your Origin Time to First Byte, or Origin TTFB. This tracks how long it takes your server to respond to a request from the CDN. If your Client-to-Edge latency is normal but your Edge-to-Origin latency is spiking, your CDN is likely struggling with a cache-miss storm or a bad routing rule.

Next, consider a WAF or Global Edge Outage. If a provider like Cloudflare experiences a regional or global incident, your site will disappear for a significant portion of your users. This looks like a total collapse in traffic at your origin, but your edge monitoring will show a massive spike in 5xx errors. The concrete tell is a mismatch between edge status codes and origin logs. If your edge is reporting gateway timeouts but your ingress controller sees no traffic, the security layer itself is the bottleneck. Verify this by checking the public status page of your WAAP provider.

Next, investigate dependency failures. Modern applications rely on many third-party services, such as payment processors or authentication providers. When one of these external services fails, it triggers 504 Gateway Timeout errors. A dependency failure is almost always selective. You will see errors on specific API endpoints or user flows, while the rest of your site remains healthy. Check your upstream success rates for each specific service. If your login page is timing out but your homepage is loading perfectly, you are likely dealing with a provider outage rather than a broad infrastructure attack.

Finally, rule out a DNS failure. DNS, or Domain Name System, is the service that translates your web address into a machine-readable IP address. When your DNS provider has an outage, it creates a traffic cliff. You will see your request volume drop to near zero in seconds. Your server CPU usage will also drop to almost zero because no traffic is reaching your ingress controller. In a Layer 7 attack, your CPU usage would be rising due to the heavy volume of requests. If your metrics show a total collapse in traffic and a quiet server, your DNS resolution is the root cause.

Confirming the Layer 7 DDoS Signature through Telemetry

A sudden Requests Per Second (RPS) spike that has no obvious business cause is often the first indicator. If traffic triples in minutes and no marketing campaigns are active, the ingress logs usually show a surge that does not follow the typical daily curve.

The second signal appears as endpoint concentration on uncacheable routes like login pages, search queries, or checkout flows. These paths are targeted to bypass the CDN and hit the origin database directly. Telemetry often shows a disproportionate amount of traffic hitting a specific path, such as /api/v1/search, while homepage traffic remains flat. A quick check of the cache-miss ratio on edge nodes usually confirms a shift toward 100 percent misses on the targeted path.

High variance in source IPs combined with uniform request behavior provides further confirmation. In a legitimate surge, users arrive at different times and browse different pages. A botnet is more coordinated, with thousands of unique IPs making the exact same request at fixed intervals. Counting the requests per IP over a sixty-second window often reveals a cluster of addresses all making an identical number of requests per minute.

The final confirmation often comes from anomalous User-Agent patterns. While these strings identify the browser or device, a massive volume of identical browser strings across thousands of different IPs is statistically impossible for human traffic. When the most active IPs are all using the exact same version of a browser to hit a single uncacheable route, a Layer 7 DDoS is almost certainly the cause.

The 60-Second Verdict: A Layer 7 Confirmation Checklist

Once the telemetry has been analyzed, a quick final check helps ensure the mitigation path is the correct one. This list is designed to be answered using the dashboards already open in the middle of the incident.

  • Internal and Path Stability: Are the recent deployment logs and feature flag changes cleared? Are the status pages for your CDN, WAF, and DNS providers showing green?
  • Unexplained RPS Surge: Is the current Requests Per Second (RPS) spike unrelated to any scheduled marketing events, product launches, or known viral traffic?
  • Endpoint Hotspots: Is the traffic concentrated on expensive, uncacheable routes like /search, /login, or POST-heavy API endpoints?
  • Uniform Behavior: Do the logs show thousands of unique IPs all making requests at the exact same cadence or following the same rigid path through the application?
  • User-Agent Anomalies: Is there a massive volume of identical or implausibly generic browser strings—such as an outdated Chrome version—hitting your origin?
  • Resource Exhaustion Order (L7 vs. L4): Is the application showing CPU or database connection exhaustion while your ingress bandwidth remains well below your pipe’s capacity? If you see pinned CPU with low throughput, it is a Layer 7 signature.
  • The “Whack-a-Mole” Signal: Have initial attempts to block the top most active IPs resulted in zero improvement to the overall 5xx error rate? If the error rate is indifferent to IP blocking, the attack is highly distributed.

Scoring the Verdict

A majority of “Yes” answers confirms a distributed Layer 7 attack. In this scenario, the standard response—scaling more pods or blocking individual IPs—will likely fail. The application logic itself is being exploited to exhaust the backend. The next move is to activate the DDoS mitigation layer or reach out to an emergency response provider.

If there is a majority of “No” answers or mixed results, the issue is likely a “look-alike.” A mixed result often points back to a dependency failure or a subtle CDN misconfiguration. In these cases, shifting to an attack mitigation posture could actually make the situation worse by blocking legitimate users or masking the real root cause in the infrastructure. Return to Section 2, re-examine the look-alike that most closely matches your current metrics, and validate it before shifting posture.

You Have a Diagnosis. Here Is What Happens Next.

Once the telemetry points to a confirmed Layer 7 signature, the diagnostic phase ends and the active defense phase begins. For an SRE, this is the most critical transition because every minute of indecision directly impacts the availability SLA. The goal is no longer to understand why the service is failing, but to restore stability by any means necessary.

If the internal and external checks confirm a malicious surge, the next step is a structured response. Go through the , which covers the specific technical maneuvers for filtering traffic at the ingress. This guide provides a parallel track for SRE, Security, and Communications teams to ensure the response is coordinated and the recovery is as fast as possible.

If the diagnosis remains unclear but the application is still bleeding, the most efficient path to recovery is to engage an Emergency Response Team through an “Under Attack” lifeline. This allows you to offload the filtering to experts so you can focus on the stability of your origin and your upstream dependencies.

Indusface
Indusface

Indusface is a leading application security SaaS company that secures critical Web, Mobile, and API applications of 5000+ global customers using its award-winning fully managed platform that integrates web application scanner, web application firewall, DDoS & BOT Mitigation, CDN, and threat intelligence engine.

Frequently Asked Questions (FAQs)

How can I tell the difference between an organic traffic push/"flash crowd" and a botnet?

The most reliable way to differentiate between a viral marketing success and a Layer 7 attack is to look at your business conversion metrics. In a legitimate traffic spike, your search-to-cart or login-to-dashboard ratios usually remain stable even as volume grows. In a DDoS attack, you will see a massive surge in traffic to your entry points but a total collapse in downstream conversion. Real users navigate through a site. Bots stay on the target endpoint and hammer it until the resource is exhausted.

Why is my server CPU at 100 percent when my network bandwidth is only at 10 percent capacity? +

This is the classic signature of a Layer 7 attack. Unlike a Layer 3 or 4 attack that tries to saturate your internet pipe with raw data, a Layer 7 attack targets the processing power of your application. By hitting uncacheable or expensive database queries, an attacker can crash your web servers using very little bandwidth. If your telemetry shows your application is struggling while your network throughput remains low, you are likely dealing with an application-layer threat.

Should I just scale my pods to handle the increased load? +

In a Layer 7 DDoS scenario, auto-scaling is often a dangerous instinct. If you scale your infrastructure to meet the demand of a botnet, you are essentially feeding the fire with your own cloud budget—a tactic known as Economic Denial of Sustainability (EDoS). Because the traffic is malicious, the botnet will simply scale its requests to match your new capacity until your backend database or your credit limit fails. Unless you have effective filtering at the edge, scaling will only increase your “Mean Time to Recovery” and your monthly bill.

What is the fastest way to identify a "smoking gun" in my ingress logs? +

Aggregate your logs by client IP and User-Agent string over a one-minute window. You are looking for high-volume outliers that show perfectly uniform behavior. In a normal environment, human traffic is messy and distributed. If your top fifty IP addresses are all making exactly ten requests per second and using the same outdated browser string, you have found your attack signature. This level of coordination is a clear indicator of a scripted botnet.

Is geo-blocking a safe first-step for mitigation? +

Geo-blocking works if your attack traffic is concentrated in a region where you have zero customers. Modern botnets are globally distributed, so treat it as a secondary measure rather than a first response. A more surgical approach is to use rate-limiting based on the request patterns you identified in your logs rather than blocking entire geographic regions.

How do I know if the attack is bypassing my WAF? +

If your WAF is active but your origin servers are still seeing a massive surge in 5xx errors, the attack is either bypassing the WAF or the rules are too broad. Check your origin logs for any traffic that does not have the expected headers from your security provider. If you see direct IP traffic hitting your load balancer, your “origin hide” strategy has failed. In this case, you must restrict your ingress to only allow traffic from your WAF provider’s IP ranges.

What should I do if my diagnosis is still mixed? +

Treat it as a look-alike until one signal clearly dominates. Revert the most recent deployment or check your top dependency’s status page first. If neither resolves it and IP blocking has no effect on the error rate, shift to an under attack posture and engage your managed provider.

Join 51000+ Security Leaders

Get weekly tips on blocking ransomware, DDoS and bot attacks and Zero-day threats.

We're committed to your privacy. indusface uses the information you provide to us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

AppTrana

Fully Managed SaaS-Based Web Application Security Solution

Get free access to Integrated Application Scanner, Web Application Firewall, DDoS & Bot Mitigation, and CDN for 14 days

Get Started for Free Request a Demo

Gartner

Indusface is the only cloud WAAP (WAF) vendor with 100% customer recommendation for 4 consecutive years.

A Customers’ Choice for 2024, 2023 and 2022 - Gartner® Peer Insights™

The reviews and ratings are in!