Lessons learned from Arthur C. Clarke’s classic epic 2001: A Space Odyssey; and Stanley Kubrick’s movie adaptation released 50 years ago in 1968 still hold true to this day. Reliance on technology to run your business (or operation) must be a carefully planned initiative when ‘failure is not an option’. This includes alternate IT processing capability to minimize business disruption when disasters occur.
The 2001 story line is about mysterious artifact (monolith) deposited by aliens on earth during pre-caveman era approximately 2.5 million years ago, where it seemed to enhance human evolution into a state of greater intelligence Fast forward to the 21st century — It is then found on the moon. This discovery prompts a manned mission to Jupiter to seek answers from these unknown alien neighbors.
The spacecraft known as Discovery One has all of its mission-critical spaceship operations under the control of a supercomputer, HAL Series 9000. The HAL 9000 computer, which is capable of “mimicking most activities of human brain,” was considered foolproof, reliable, and incapable of any processing error, including distorting information (data) stored in its memory. Even as efficient as HAL was in the movie, human interaction was still required at times to carry out programming actions (sound familiar?) The HAL Series 9000 computer first came became operational in Urbana, Illinois on January 12, 1992. Despite being 22 years ago, such similar computers and manned space travel to the end of the solar system remains fictional.
To follow best practices when dependent on IT technology for important business functions, in 2001, Earth’s Mission Control had an alternate HAL Series 9000 duplicate ‘sister’ computer with exact applications/systems and complete user data operating in real time synchronization between the spacecraft and Mission Control to validate similar functions to confirm accuracy.
In 2001, the astronauts discovered that HAL made an operational error: false prediction of an imminent antenna failure that did not happen, when conflicted over processing priorities. This resulted in putting the mission to Jupiter at risk. When queried, HAL insisted that the problem was due to “human error”. This uncertainty with computer performance with the spacecraft critical operations in its control concluded with terminating HAL’s processor functions and utilizing the ground-based sister computer for continued, seamless operations control to ensure the successful completion of the space mission.
Who would have thought that in 1968 when computer technology was still in its infancy stage that ‘alternate computing capability and data replication’ (IT resiliency) made good sense to protect your mission critical operations? Utilizing such computing strategies similarly in a business allows any organization to continue its IT processing capability with minimal losses to normal operations.
With businesses’ reliance on technology and automation, functioning without computer systems is ‘risky business’. Today disaster recovery planning for IT infrastructure has to be a strategic initiative starting with C-level management, as they have accountability for business survival and preserving customer and stakeholder interests. The most important task for IT disaster recovery planning is to assess the risks to the organization if computer systems fail for periods of time impacting production, deliverables or services to customers. For IT departments, planning starts with collaborating with business users who can influence the appropriate disaster recovery strategy, considering the following:
- The recovery time objective (RTO): the duration of time that a critical business process application and supporting system must be restored after a disaster (or disruption) in order to avoid unacceptable consequences.
- A recovery point objective, or “RPO”, which is the maximum tolerable period in which data might be lost from an IT service due to a major incident.
Identifying the RTO and RPO can also assist the business in determining potential financial losses as well as the cost of mitigative strategies to protect the business against IT failures.
Seven (7) proven IT resiliency strategies include:
- A hot site (most costly option) is a close duplicate of the existing computer processing capability and includes duplication of data from the primary site. Simply put, it is an alternate IT environment mirroring the current production site in real-time, like 2001: A Space Odyssey.
- A warm site is a less costly option with reduced features and time to configure to be operational. Typically has the computer hardware similar to that of the original site and may not contain the applications and data which would be loaded at time of need.
- A cold site is not an expensive option, but takes the most time to bring into production, as it does not have the equipment or backups in place.
- Managed hosting is a type of Internet hosting in which the client leases from a 3rd party an entire server not shared with anyone else.
- Regarding loss of communications for any reason—such as a power outage, PBX failure, broadband connectivity loss, or natural disaster—utilizing SIP trunking can automatically redirect your telecom calls to any wireline or wireless telephone number locally or internationally.
- Cloud computing is delivered by a service provider where you are using outsourced computing infrastructure (hardware, software/applications) instead of actually owning it thus passing the risk of IT integrity/availability to a third party.
With today’s global environment, 24/7 operations, customer expectations, and reliance on technology, most businesses would find it a challenge functioning without their computer systems operating. As a result, today’s disaster recovery planning for IT has become a critical aspect for business continuity planning.
You can’t predict and emergency or disaster, but you can plan for one.