Views, News & more
British Airways infamous IT crash has pushed the company into the headlines and cost it a very large sum, but what can businesses learn from this catastrophic system failure?
More than a third of the British Airways’ flights were cancelled from Heathrow after the airline was hit by a worldwide computer system power failure, leaving thousands of passengers stranded. According to the company, the root cause of the issue was a problem with a power supply, which affected the IT system. The immediate effect was that operational aspects of the business were unable to function as they should, severely curtailing customer service.
CEO Alex Cruz has suggested that the problems were caused by a power surge in one of the firm’s UK data centres, which had gone on to affect the whole of BA’s network. This has led to much speculation about how the firm should manage their servers in the future to prevent such events, and how much of an impact it could have on the firm’s worth. According to preliminary data, BA’s owners have seen £360 million wiped off their value following the incident, and could have to compensate customers for more than £150 million.
As is so often the case when large corporates have major IT failures, details are rarely forthcoming. We know that BA has six data halls on two sites near the Heathrow Waterside headquarters, but won’t disclose details on how the power was disrupted or which piece of kit died. Or why the disaster recover plan failed, with no back-ups bringing the system back to life.
Data centre designers have questioned the explanation provided by Alex Cruz, saying that a power surge should never be able to bring down an entire data centre, let alone its back-up as well.
There is a rumour circulating that the data centre was not affected by a power failure. Rather, engineers were told to apply some security patches to Linux and Windows servers. After applying them, the engineers shut down and attempted to restart the entire data centre. This immediately caused various components, including memory chips and network cards, to fail. Hence the ‘power supply’ issue.
Whether this actually happened or not here, this type of failure does occur and highlights the need for a mirrored data back-up facility.
While a power outage or hardware failure at a small or medium-sized business is unlikely to have such an obvious impact on its customers as BA’s IT failure has had on thousands of passengers, it should act as a reminder that even a localised IT issue can have a huge effect on a company’s overall operation. While BA outsources much of its IT operations overseas, and may have been unlucky to have faced such a significant outage in one of its UK data centres, many UK businesses rely solely on private services to hold their data, raising the question of whether they are prepared for potential future power outages.
For those using in-house servers, a power outage could mean both short and long term issues. Namely, if data is lost during an outage and no backup plan is in place, the business will undoubtedly suffer with the worst-case scenario being a complete corporate collapse. This may sound dramatic, but it’s a very real possibility for smaller companies that have failed to have an emergency recovery plan in place.
As a result of this risk, many firms choose to outsource at least some of their IT services and data to cloud companies. The main benefits of this approach are that large service providers can often provide greater data security than smaller firms could afford to put in place and maintain. Additionally, large cloud servers regularly offer assurances of backup plans should hardware failures or power outages take place.
It is very rare for a data centre, especially one running cloud services for multiple customers, not to have a complete back-up stored far away, so that even if the centre was hit by, say, a meteorite, the back-up would still kick in within hours at the latest, with minimal disruption.
In order to address this issue, businesses should ensure they have backup plans in place so they can take full advantage of external cloud servers whilst minimising the risk. These plans should include:
Plan for recovery time – It’s likely that your data will be safe even if your cloud provider has a power outage, but you may have to wait for them to recover your information. In the meantime you should have plans in place to ensure as many of your services as possible remain available in the event of an outage.
Plan for multiple failures – It may not simply be an IT outage that affects your business – make sure you plan for as many failures as possible to ensure you suffer the smallest impact should an incident occur.
Consider a hybrid set-up – When critical data is being stored and accessed locally, consider a scenario where the data is backed-up, perhaps encrypted, off-site on a cloud platform that has the ability to run the identical processes, should it be necessary to switch in the event of a local failure.
Learn from mistakes – BA now has the opportunity to learn from its mistakes and built a strong system to support its services in the future. Ensure you do the same should you ever suffer any outages, learning from mistakes and building a more robust system to safeguard the future of your business.