An article printed in CIO magazine last summer entitled “8 ingredients of an effective disaster recovery plan” by Jennifer Lonoff Schiff (Jul 5, 2016) discussed important DR components with several IT executives and disaster recovery experts who shared advice on how to create a DR plan. Based on our experience at CloudSAFE, you may also wish to consider these additional ingredients for your overall DR plan.
- Know your DR software licensing and provision wisely. Keeping legal copies of software operating during a DR event requires considerable fore-thought. Whether it's testing applications during a test or doing recovery operations, there are times when the same application will require multiple licenses. Face it, paying for identical licenses for all of your applications that may never be used is frustrating; paying for them as you need to use them is a much better solution. Since licensing varies significantly from vendor to vendor, check with your software vendor to see how this can best be accomplished. You don’t want to be working through licensing issues during an actual disaster when you need the systems to operate in a separate environment.
- Know the difference between disaster recovery and operational recovery when you define your recovery operations. These two different recovery methods use the same parameters (RTO and RPO); however, if your DR site Is outsourced, there are different people, conditions, and possibly different processes that are involved in each type of recovery. Operational recovery is based on the technology, staff, and processes that are defined and executed by your data center staff and processes. When operational recovery is exhausted and/or is not able to execute because of the results of an outage, the decision to invoke the DR plan and alternate data center/service is made. This decision is the point that defines the end of operational recovery and the beginning of disaster recovery.
- Know the difference between SLOs and SLAs; these terms can mean different things to different people. Like RTO and RPO, SLO and SLA need to be defined explicitly. For example, spinning up an application cluster with data availability may or may not satisfy an RTO/RPO agreement clause in a contract. If users can’t connect due to a myriad of reasons (inability to authenticate/authorize, get through a firewall, etc.), additional recovery initiatives are required to bring the overall system to a fully-recovered state.
- Include your key supply chain vendors in the DR testing; this should be bi-directional. A disaster with either the supplier or the OEM can drastically impact both parties. When either party is having big problems, both parties are having big problems. For this reason, we recommend that the Tier 1 applications (see bullet 2 in the article) that involved supplier integration also be included as part of the DR test. Obviously, this is tailored by the type of integration (data only, application/data API integration, etc.) and the amount of stock/inventory that is available for that supplier; however, it isn’t rated as “needed immediately” for nothing!
- If your DR environment is outsourced, include your change and configuration management process in this outsourced environment, too. Most DR sites are updated using either application or infrastructure technology. Since this is an integrated “stack”, it is important to have all elements of the stack in working order when making normal data center changes. Again, trying to determine the correct versions/releases of systems and documentation during a disaster event can be an exasperating experience.
- As part of DR planning, execute a system application to gather a complete system “map” of application “routes”, typical response times, resource utilization, etc., to baseline system behaviors in a normal environment that will have a hosted image in a DR environment. This information may be invaluable in helping solve system connectivity and performance problems either while executing in the DR state or while returning the system to its normal state once recovery of the new environment is built.
- If your DR system is outsourced, do you trust your outsourced provider during times of severe outage, especially with your compliance and security processes? This is a complex topic where so many things can go wrong during a disaster event. Do you have a plan for staffing your own people to do testing of secure systems at all times, both locally and remotely, when dealing with PCI/DSS and/or HIPPA data? Do you trust your outsourced provider to give it their all during a disaster event or will your vendor makes decisions based upon what the contract says? If outsourcing is part of your DR process, make sure you know these and other questions/situations that may arise during times of trouble.
We’d love to hear about from you about other items that should be included based upon your experience. Please add to the conversation in the comments sections below.
How prepared is your organization to protect its data and to quickly recover data and operations in the event a problem does occur? Our brief survey will give you an estimate of how your organization measures up in 9 different areas of data protection and disaster recovery. You'll also receive recommendations for continuous improvement in all 9 facets measured.