IT outages are always bad news for the organizations hit by them, but only certain outages seem to make actual news headlines. Here are examples of outages that did attract media attention over the past year. See if you can spot the common element they all share:
F1 race: Two Formula One cars collided during a Grand Prix practice race in Italy last month when an outage involving a fibre optic cable knocked out radio communication between drivers and their pit crews.
Inter-nyet: When Russians had spotty access to sites like Google and YouTube in March, the Kremlin blamed it on a data centre fire in France. Some pundits, however, believe the Russian government orchestrated the outages to quash news coverage of dissident Alexei Navalny.
Azure outage: The U.K. government was unable to update its COVID-19 data dashboard for hours last fall when faulty water-cooling pumps took an Azure data centre offline.
All of those outages made headlines because they featured slightly sensationalist circumstances that make for online clickbait. A Formula One crash! Political intrigue! COVID-19 data! In reality, however, something far more mundane is behind a growing number of outages: IT networking and software issues.
The hard numbers behind IT outages
According to Uptime Institute’s third annual survey of data centre owners and operators, these were the top causes of outages in 2020:
- on-site power failure (37%)
- software or IT systems error (22%)
- network issues (17%)
In its report, Uptime Institute also explores outage data based on public sources such as media coverage, social media posts, outage tracking services and statements issued by data centre owners and customers. When you consider outages reported by these public sources, IT network and software issues figure even more prominently as causes of downtime incidents:
- IT software/configuration (42%)
- network IT (21%)
- capacity/demand (12%)
Here’s how Andy Lawrence, executive director of research at Uptime Institute, interprets both sets of outage data in the institute’s annual report: “Networking and configuration issues are emerging as two of the more common causes of service degradation, while power outages are becoming somewhat less of an issue.”
The question is, why?
Behind the IT outage stats
Uptime Institute attributes the rise in IT network and software-related outages to two main trends.
Shift to third-party service providers: More than half (56 per cent) of data centre operators and IT managers polled by Uptime Institute said they experienced an outage over the past three years that was caused by problems at a third-party data centre service provider.
While Lawrence acknowledges in his report that “the growing move to cloud services and extensive use of co-location can increase resiliency and reduce management worries,” he also asserts that “outsourcing brings its own challenges.”
One of those challenges, he writes, is a lack of full visibility into third-party cloud and SaaS services. In addition, he points out that “greater use of public Internet-based services and of complex, multi-site availability zones” means “modern applications and data are spread across and between data centres.”
That brings us to the second factor that Lawrence believes is fueling an increase in network and IT software related outages.
More complex, widely distributed networks: Lawrence says network architecture has become increasing complicated, and it’s carrying workloads that are much more broadly distributed.
“This rise in outages caused by IT systems and network issues is due to the broad shift in recent years from siloed IT services running on dedicated, specialized equipment to an architecture in which more IT functions run on standard IT systems, often distributed or replicated across many sites,” Lawrence states in his report.
“As more organizations move to cloud-based, distributed IT (driven by a desire for greater agility and automation), the underlying data centre infrastructure is becoming less of a focus or a single point of failure.”
None of that is as sexy as a Formula One race or a rumoured Russian geopolitical conspiracy. It should be newsworthy to enterprise IT managers and network admins, however, especially as COVID-19 continues to push more workloads into the cloud.
And why doing your due diligence into third-party and cloud services is increasingly critical.