CrowdStrike, a company charged to protect customers against adversaries and stop breaches, caused disruption of IT reliant services across the globe. CrowdStrike released a content configuration update for the Windows sensor that resulted in a system crash. It affected both servers and personal computers running Microsoft Windows operating systems. This led to outages affecting banks, airlines, retail chains and more.
It is important to acknowledge that mistakes and slips are human. Based on CrowdStrike’s Root Cause Analysis (RCA) report released, the absence a staged rollout appears to be a huge mistake in their software development Lifecyle. Lacking the phases meant the update was never tested in an environment which would be the same as real world environment. They did not get to test with real-world conditions. The result was instant blue screen of death (BSOD) for many users.
What is a Canary Release
In software development, a canary deployment involves gradually rolling out new code to a subset of users to identify issues before they affect the entire user base. This approach can help reduce risk and uncertainty around larger software releases. For example, a canary deployment can be used to test competing features, or to provide advanced features to power users while disabling them for new users.
Phased rollout reduces user exposure to negative operational issues. Some users may still experience problems, but the canary model keeps this number low, and because rollback happens quickly, it minimizes the negative experience for individual users.
Read: CrowdStrike Update Led To A Global Windows Outage: What Happened? Lessons?
CrowdStrike does acknowledge that the problematic update was developed and tested according to their standard software development processes. In the RCA, the company has confirmed the process has been improved since the outage, “While this scenario with Channel File 291 is now incapable of recurring, it informs the process improvements and mitigation steps that CrowdStrike is deploying to ensure further enhanced resilience.”
Further, the company said, “We apologize unreservedly and will use the lessons learned from this incident to become more resilient and better serve our customers.”
The Austin-based company is facing a barrage of lawsuits after the outage last month. Already, it has been sued by airline passengers whose flights were delayed or canceled.