Get a free application, infrastructure and malware scan report - Scan Your Website Now

Subscribe to our Newsletter
Try AppTrana WAAP (WAF)

Managed WAF

Starts at $99

Guided onboarding, monitoring of latency, false positives, and DDoS attacks, custom rules, and more

Try Free For 14 Days

Cloudflare’s Outage – Key Takeaway, Design for Failures

Posted DateJune 22, 2022
Posted Time 4   min Read

Downtime and outages: are they common? While downtime and inaccessibility of small sites go unnoticed, the awareness of massive outages spreads faster and makes it into the headlines.

The recent internet outage has taken down many of the biggest sites, including Amazon, Discord, Canva, Crunchyroll, and Medium, and it is learned that due to a change pushed by Cloudflare. The key point to notice here is that downtime is inevitable, and this is not the first such news, nor will it be the last. 

Any service will go down. When we presume that services always be up 100% of the time, that is when we run into issues. This is where what Indusface refers to as Design for failure plays a critical role and wants to look at a fundamental shift in thinking to ensure the impact of the failure is restricted only to the services provided and nothing more.

This is the essential issue with this outage where the scope of its impact went beyond just the non-availability of the service (security and CDN) and the business site itself going down while Cloudflare was recovering.

Could it Happen Again?

Cloudflare’s big outage on 21st June morning impacted various sites and caused login difficulties and crashes on multiple services. According to the Cloudflare blog, it took them more than an hour to recover completely. This was the most massive outage, but the NET infrastructure service provider also experienced similar issues in 2020. 

Cloudflare is not the only CDN (Content Delivery Network) provider. Other high-profile companies, including Fastly and Akamai, experienced a service outage. Downtimes are not uncommon. The inevitable fact is that distributed systems are complex, and though a lot of energy is put into secure deployment, slippages happen, and an outage cannot be avoided altogether.

How is AppTrana Prepared for Outage and Downtime?

“The Cloudflare outage has necessitated a rethink on how such outages, which could cripple business operations temporarily, can be overcome. More often than not, while choosing or building a service, there is a focus on the kind of features and capabilities the service offers. However, it is important to evaluate the service provider/vendor’s ability to support you in a service outage.” – Venkatesh Sundar, Founder and CMO of Indusface.

At Indusface, our experts believe that if we should fail, let’s fail gracefully, with a Design for failure mechanism. By falling in the right way, we can roll back and minimize the consequences. Careful planning, proper architectural design, and quicker resolution to failure can bring you back quickly to meet your uptime requirements. 

Design for Failure 

With the fail-safe mechanism, you can choose whether to remain available or secure. By default, if AppTrana can’t verify the request, it is considered a malicious request. It blocks the request. 

Our WAF SLA is 99.99% uptime. However, there is always a slight probability of disruption due to unexpected technical issues. Besides ensuring availability, we approach failure as inevitable. We plan it accordingly with the intent to minimize the impact of the failure to only the services we provide and not to the website itself going down. Our WAF architecture prepares for failover by adding a separate function known as bypass fleet.

For a recent outage like the one Cloudflare experienced, it would be possible to enable the bypass feature on the fly to temporarily forward all the requests to the backend servers in the target group. This feature enables you to deliver reliable customer services in such outages restricting the impact of the outage only by not having the acceleration and security services not available during that time. Having the entire site go down is a larger issue and gap in “Design for Failure” in this case.

Know Why Enterprises Choose AppTrana Over Cloudflare

A Little Background on The Need for Bypass Fleet

The problem with cloud WAF is that though traffic through the cloud would be protected, if someone knows the server IP, they can reach directly to your server, bypassing WAF configurations. To avoid this, we provide origin protection in AppTrana. 

Every onboarding, we let the customer know the set of IPs through which they will get requests. This IP range could be whitelisted in their network to protect its origin. But this also brings operational challenge; if for any reason like what happened to Cloudflare and customer needs to route their traffic to origin directly, they need to get these IPs whitelisted by the IT team, which will be a complex time talking process in big organizations.

It is to avoid this that we have built a bypass fleet. Bypass fleet is a redundant architecture in our Infra, a simple TCP proxy redirecting traffic through the same IP’s customer has whitelisted. So, in case of any failures, sites can be bypassed to ensure availability while WAF is bought back up. We are providing customers options on how they want to react during failures. This is what we call Design for failure.

This feature has had other natural side effects where this feature is also used in day-to-day processes when a customer wants to isolate any problem during changes at the origin. This allows us to fail gracefully and give some control to the customer for them to decide how they want to react in such outages. This is one example of how we have built our system ground up by thinking about what we call Design for failures.

Continuous Monitoring for Failure

IT teams should not be just aware of how their server look when uptime is 100%. They should also predict changes to the environment contributed by downtime incidents. Our continuous monitoring tracks the website continuously and alarms instantly in case of an outage or downtime. Besides, our real-time visualization and reports aid you see future states with the proper context required to plan for failure. 

These features enable you to deliver secure, reliable, and highly responsive IT services.

What’s Next?

If you’re serious about your service availability, you should consider a massive paradigm shift. Invest in multiple layers of security to ensure data integrity, proactively design your system to fail, have effective recovery plans, and achieve high uptime, which keeps you moving forward.

Look at every system in your architecture and think deeply if these systems have been designed for failure and if it gives you enough control to react quickly when things fail to restrict the scope of the outage to be only at the service level and nothing beyond.

Stay tuned for more relevant and interesting security articles. Follow Indusface on FacebookTwitter, and LinkedIn.

Vivek Gopalan

Vivekanand Gopalan is a seasoned entrepreneur and currently serves as the Vice President of Products at Indusface. With over 12 years of experience in designing and developing technology products, he has a keen eye for building innovative solutions that solve real-life problems. In his previous role as a Product Manager at Druva, Vivek was instrumental in creating the core endpoint data protection solution which helped over 1500 enterprises protect over a million endpoints. Prior to that, he served as a Product Manager at Zighra, where he played a crucial role in reducing online and offline payment fraud by leveraging mobile telephony, collective intelligence, and implicit user authentication. Vivek is a dynamic leader who enjoys building and commercializing products that bring tangible value to customers. In 2010, before pursuing MBA, he co-founded a technology product company, Warmbluke and created a first-of-its-kind innovative Civil Engineering estimator software called ATLAS. The software was developed for both enterprise and for SaaS users. The product helps in estimating the construction cost using CAD drawings. Vivek did his MBA from Queen's University with Specialization in New Ventures. He also holds a Bachelor of Technology degree in Information Technology from Coimbatore Institute of Technology, Anna University, one of the prestigious universities in India. He is the recipient of the D.D. Monieson MBA Award, Issued by Queen's School of Business, presented to a student team which has embraced the team-learning model and applied the management tools and skills to become a peer exemplar. In his spare time, Vivek likes to go on hikes and read books.

Share Article:

Join 47000+ Security Leaders

Get weekly tips on blocking ransomware, DDoS and bot attacks and Zero-day threats.

We're committed to your privacy. indusface uses the information you provide to us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Related Posts

AWS WAF vs. Cloudflare
AWS WAF vs. Cloudflare

In this article, we’ll discuss the similarities, differences, pros, and cons of AWS WAF and Cloudflare.

Read More
Imperva WAF alternatives
Top Imperva WAF Alternatives in 2024

Discover the pros and cons of Imperva WAF and the top 5 Imperva alternatives, including AppTrana, Akamai, Cloudflare, Fastly, & AWS WAF.

Read More
Akamai WAF vs. Imperva WAF
Akamai vs. Imperva WAF

Imperva WAF vs. Akamai WAF compared: Examine advantages, drawbacks, and unique features of the leading WAF solutions. Learn why AppTrana stands out.

Read More

AppTrana

Fully Managed SaaS-Based Web Application Security Solution

Get free access to Integrated Application Scanner, Web Application Firewall, DDoS & Bot Mitigation, and CDN for 14 days

Get Started for Free Request a Demo

Gartner

Indusface is the only cloud WAAP (WAF) vendor with 100% Customer Recommendation for 3 consecutive years.

A Customers’ Choice for 2022 and 2023 - Gartner® Peer Insights™

The reviews and ratings are in!