CrowdStrike Chaos Underscores Serious Downsides of Automatic Software Updates
July 26, 2024
4 min read
July 26, 2024
4 min read
The recent CrowdStrike outage marked a watershed moment for IT and touched off a global conversation about the downsides of automatic software updates in the cloud era.
The incident affected approximately 8.5 million Windows devices across thousands of business IT environments. Entire health systems went offline, millions of paychecks saw delays in depositing, and airlines found themselves canceling an entire day’s worth of flights, with those that could get off the ground manually sharing crucial updates on whiteboards.
While the media was quick to assign blame, experts say the technology industry’s shift to cloud-native modernization all but guaranteed something on CrowdStrike’s level (or worse) would eventually happen. Now that it has, the chaos it created provides a reminder that – despite the speed and convenience cloud-based systems can offer – there’s no substitute for a thorough and deliberate approach to updating software.
On July 19, 2024, CrowdStrike released an automated sensor configuration update to its Falcon sensor software, a “normal part of the sensor’s operation” that occurs “several times a day in response to novel tactics […] discovered by CrowdStrike,” according to the company’s blog.
Unfortunately, this particular update shipped with a coding error that caused Falcon to search for an invalid memory address within Windows, according to an article on DEV. The 8.5 million affected Windows devices took this unusual behavior as sign of a potential threat and defaulted to the Blue Screen of Death to prevent further risk, an issue that persisted through rebooting for most customers.
As global systems went down to the blue screen boot loop, software that interfaced with affected systems also began to crash, setting off a ripple of outages that many companies are still struggling to fully resolve.
It would be an understatement to say a single faulty bit of code and an overlooked checking procedure should not sideline entire sectors of the global economy for a full day. But as the CrowdStrike outage shows, historic cyber events don’t always start with a bad actor or even a particularly notable mistake.
Instead, the events of July 19 hit on a larger trend: the loss of control companies have increasingly experienced over the years as they replace their on-premises software with cloud-based alternatives.
As intelligence provider GlobalData says in a Yahoo Finance article, the practice of customers vetting and testing updates before pushing them to users has faded with the move to cloud. Instead, businesses increasingly trust software vendors to carry out automatic updates.
While the change created undeniable time savings, it also results in customers signing away a huge responsibility, one customer IT departments have historically managed themselves until the last several years.
Security providers like CrowdStrike are far from the only cloud software vendors that could create problems with an errant update. Any software that can push updates without customer intervention has at least some ability to create inadvertent trouble. Throw in the potential for cyberattackers to capitalize on disrupted updates (a key facet of certain supply chain attacks), and it’s easy to see why the enterprise would be motivated to solve this lingering downside of automatic updates.
As GlobalData says in its analysis, the only realistic path forward from an event like this starts with dialogue. High-level IT leaders need to discuss how their software update/patching processes have changed since adopting cloud technology and work to find practices that put oversight and implementation back in their control.
At minimum, companies concerned about another major event should consider:
The cloud supports and enhances every part of modern businesses, and its strengths and convenience are too compelling to consider going back. Even so, CrowdStrike’s misstep and the cloud-dependent IT landscape that allowed it to turn into a full-blown crisis illustrate how that speed and convenience can turn into a downside without thorough testing and oversight.
Instead of assuming every update that comes your way is a good thing, give yourself time to ask plenty of questions and test plenty of outcomes. You’ll be grateful when you don’t have to individually recover thousands of PCs on a Friday afternoon.
Gain insight into industry-only news, access to webinars, tips and tricks, blog posts, podcasts, and guides, surrounding topics like cybersecurity, reducing software support and maintenance costs and much more, all delivered to your inbox each month.
LEARN MORE