Security

Patches

CrowdStrike meets Murphy's Law: Anything that can go wrong will

And boy, did last Friday's Windows fiasco ever prove that yet again


Opinion CrowdStrike's recent Windows debacle will surely earn a prominent place in the annals of epic tech failures. On July 19, the cybersecurity giant accomplished what legions of hackers could only dream of – bringing millions of Windows systems worldwide to their knees with a single botched update.

As a veteran tech journalist, I've seen my fair share of software snafus. Heck, I went hand-to-hand with the grandpa of all network blow-ups – the Morris Worm – in 1988 when I was a sysadmin. Even so, I can't help but marvel at the sheer scale and impact of this blunder. CrowdStrike, a company valued at over $70 billion and trusted by countless organizations to protect their digital assets, inadvertently became the source of one of the largest IT outages in history.

The fallout from this debacle was staggering – thousands of flights canceled, healthcare services disrupted, and 911 systems knocked offline. It's a stark reminder of how deeply intertwined our digital infrastructure has become and how vulnerable it can be to a single point of failure.

Let's break down the cascade of errors that led to this fiasco.

In the beginning, Microsoft enabled CrowdStrike's Falcon security software to run at the zero level of the Windows kernel. Any problem at this low level will likely cause a Blue Screen of Death (BSOD). Meanwhile, Microsoft reportedly wants to blame the European Commission – no, really – for requiring it to grant third-party software vendors this level of access.

You know, I think with all of Microsoft developers and lawyers, they could come up with a better, legal way to avoid this kind of foul-up and let software companies compete equally. It's not rocket science. 

Microsoft doesn't want any of the blame, but it deserves some of it. For far too long, we've placed too many vital IT eggs in the Windows basket. When that basket falls, so does much of the economy.

Returning to CrowdStrike, the company claims a "logic error" in a routine sensor configuration update caused the meltdown. But for a company of CrowdStrike's caliber, such a fundamental mistake is inexcusable. This wasn't some obscure edge case – it was a critical failure in its core functionality.

It wasn't even a code problem. This wasn't a software update per se. The villain of this piece was a Falcon configuration file called a channel file. One simple file containing what should have contained data to update a security setting ended up causing a cascade of one BSOD after another.

How did such a catastrophic bug pass quality assurance? CrowdStrike admitted: "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data [and] were deployed into production." When your software has deep hooks into millions of Windows systems, your testing should be bulletproof. Clearly, CrowdStrike's testing protocols need a massive overhaul.

We also now know, as security expert Kevin Beaumont pointed out on Mastodon: "The key takeaway – channel updates are currently deployed globally, instantly." I always send major patches to all my customers simultaneously and wait to see what happens next. Doesn't everyone? Who are these people, and why does anyone let them do security work?

There's a simple concept called canary testing. You may have heard of it. Like the proverbial canary in a coal mine, you first test whether a new space – or program – is safe by trying it on a canary – or a small group of users – and then, if all's well, let everyone else in.

Let's not forget that CrowdStrike's initial response was slow and inadequate. Users were left scrambling for answers while critical infrastructure faltered. Even today, almost a week later, I still have friends having trouble with their Delta flights.

This serves as a sobering wake-up call for the rest of us in the tech industry. As we rush to secure our systems against external threats, we must not overlook the potential for self-inflicted wounds. Rigorous testing, fail-safe mechanisms, and a healthy dose of humility are essential when dealing with critical systems.

In the end, CrowdStrike's Windows fiasco is a textbook example of Murphy's Law in action – anything that can go wrong will go wrong. It's a painful lesson but one that we would all do well to learn from. After all, in cybersecurity, your next big threat might just be an update away. ®

Send us news
98 Comments

When old Microsoft codenames crop up in curious places

Chicago is my kind of driver model

Suggested Actions fails to suggest its own survival as Windows 11 feature killed

Final curtain call for weird wingman

Microsoft coughs up yet more Windows 11 24H2 headaches

Users report the sound of silence from operating system update

Microsoft won't let customers opt out of passkey push

Enrollment invitations will continue until security improves

The winner of last year's Windows Ugly Sweater is ...

Register readers have spoken

Good news! You'll soon be able to send faxes again with Windows 11 24H2

Microsoft squashes eSCL bug

Micropatchers share 1-instruction fix for NTLM hash leak flaw in Windows 7+

Microsoft's OS sure loves throwing your creds at remote systems

Outlook is poor for those still on Windows Mail, Calendar, People apps by end of year

We're sure you'll learn to love the new Outlook for Windows app

£1B lawsuit targets Microsoft for allegedly overcharging Windows customers on other clouds

Yes, we've been over this before - several times, in fact

Windows 11 market share falls despite Microsoft ad blitz

Only 10 months left until Windows 10 end of support and people still seem to prefer it

Windows 95 setup was three programs in a trench coat, Microsoft vet reveals

MS-DOS, a minimal Windows 3.1, and finally the teal delight of Windows 95 awaited installers

Are Copilot+ PCs really the fastest Windows PCs? X and Copilot don't think so

Microsoft marketing skewered by X platform users... and its own chatbot