DDoS Testing - Production VS Staging
Explore the key differences between DDoS testing in production and staging environments to enhance your cybersecurity strategy.
Table of Contents
Executive Summary
Introduction
This document is intended to address the question of whether a DDoS simulation should be conducted against a production environment or against a staging environment when both are available. It sets out the gains obtained by testing production, the operational risks and difficulties with it, and the controls applied during the engagement to keep those risks bounded.
Red Button's Recommendation
Red Button strongly encourages production testing, as it provides the most accurate simulation of a real DDoS attack and avoids the false sense of security that staging environments often create due to non-identical security postures and inaccurate behavioral mitigation baselines. Our testing methodology is controlled, allowing immediate stop ability, gradual attack rate increases, and 24/7 availability to minimize impact, ease regulatory approval, and maximize test value. For customers initially hesitant, starting with staging is acceptable, with a transition to production recommended in subsequent engagements.
Advantages and risks
Why testing production?
- The real security configuration is validated- production testing evaluated the actual posture to be challenged under a real DDoS attack, including the misconfigurations that it might include. We know that when testing a staging environment, the security engineers are “mirroring” the security rules, settings, and capacity from production, but we have had cases where the mirroring was not completed or was not 100% identical. Furthermore, certain problematic rules in production are usually under the engineers’ attention (otherwise they would have been adjusted), making production testing crucial for identifying them.
- The risk for false positives - Staging environments usually have no real user traffic. Production under DDoS stress faces simultaneous legitimate traffic, which changes how protection measures such as rate limiters and behavioral mitigations might behave. Staging can't simulate users retrying failed requests on top of attack traffic.
- Managed mitigation baselines - A significant portion of modern mitigation measures is behavioral. Cloud WAF providers, scrubbing centers, and rate-limit engines often set their Mitigation Thresholds from observed traffic patterns. These baselines are learned from real users, real geographies, and real session shapes. Staging environments do not carry that history, and their detection logic behaves differently when attack traffic arrives. Production is the only environment in which the learned baseline can be validated.
- A real DDoS simulation on production evaluates the actual incident response process - In production tests, alerts are generated from real monitoring dashboards/systems, SOC/NOC teams are notified, and playbooks are executed. Staging tests are often treated as drills, not emergencies, so human response time and decision-making aren't truly validated, while vendors’ response teams, such as AWS SRT, Akamai SOCC, etc.., might behave differently in a production attack.
The risks of testing production
- Impact - Running a DDoS test against production means stressing real, customer-facing services while customers are using them. If something goes wrong, the impact is real.
- Regulatory Requirements - Getting approval to test production isn't always straightforward. In regulated industries - financial services and healthcare being the obvious examples - the request usually has to pass through compliance, legal, and risk before anyone technical even weighs in, and sometimes a regulator needs to be notified on top of that. Each of those reviewers works on their own timeline. A few weeks of back-and-forth is normal; in heavier environments, it can stretch into months. By the time the green light arrives, the infrastructure has often changed enough that parts of the original test plan need to be rewritten.
- Rollback options are limited - Committing changes in Production environments sometimes requires explicit approval or a ticket to be initiated and approved, which makes this action harder and sometimes impossible. The advantage of testing staging over production is the ability to make these changes to test new settings/configurations or fix those that were set up incorrectly.
Red Button’s Note
Red Button always encourages customers to test production. Production testing is the best way to simulate a real DDoS attack and test the measures and configurations that will handle it. In some cases, testing the staging environment might give you a false sense of security, as the security posture is almost always not 100% identical, the threshold and sensitivity of behavioral mitigation measures are not calculated properly, and the false-positive effect of the mitigation can not be determined.
Here in Red Button, we ran over 2,000 controlled DDoS test simulations, with immediate stop ability and are able to craft the test to minimize impact on other unwanted services. In addition, during the test, we gradually increase the attack rate to ensure a controlled flow. In addition, we offer 24/7 availability that allows you to test your environment outside of business hours. This methodology allows you to test production while minimizing the risk of uncontrolled impact, easing the regulatory approval process, and getting the most value out of the test.
That being said, we understand that some customers might have difficulty testing production during the first engagement, and it is acceptable to conduct our first testing in a staging environment. Once the confidence in our controlled and planned process increases, we can move on to test production in the second test.