A disaster recovery plan (DRP) is a documented, structured approach with instructions for responding to unplanned incidents.
This step-by-step plan consists of the precautions to minimize the effects of a disaster so the organization can continue to operate or quickly resume mission-critical functions. Typically, disaster recovery planning involves an analysis of business processes and continuity needs. Before generating a detailed plan, an organization often performs a business impact analysis (BIA) and risk analysis (RA), and it establishes the recovery time objective (RTO) and recovery point objective (RPO).
A disaster recovery strategy should start at the business level and determine which applications are most important to running the organization. The RTO describes the target amount of time a business application can be down, typically measured in hours, minutes or seconds. The RPO describes the previous point in time when an application must be recovered.
Recovery strategies define an organization's plans for responding to an incident, while disaster recovery plans describe how the organization should respond.
In determining a recovery strategy, organizations should consider such issues as:
- Resources -- people and physical facilities
- Management's position on risks
Management approval of recovery strategies is important. All strategies should align with the organization's goals. Once disaster recovery strategies have been developed and approved, they can be translated into disaster recovery plans.
Disaster recovery planning steps
The disaster recovery plan process involves more than simply writing the document.
In advance of the writing, a risk analysis and business impact analysis help determine where to focus resources in the disaster recovery planning process. The BIA identifies the impacts of disruptive events and is the starting point for identifying risk within the context of disaster recovery. It also generates the RTO and RPO. The RA identifies threats and vulnerabilities that could disrupt the operation of systems and processes highlighted in the BIA. The RA assesses the likelihood of a disruptive event and outlines its potential severity.
A DR plan checklist includes the following steps, according to independent consultant and IT auditor Paul Kirvan:
- Establishing the scope of the activity;
- Gathering relevant network infrastructure documents;
- Identifying the most serious threats and vulnerabilities, and the most critical assets;
- Reviewing the history of unplanned incidents and outages, and how they were handled;
- Identifying the current DR strategies;
- Identifying the emergency response team;
- Having management review and approve the disaster recovery plan;
- Testing the plan;
- Updating the plan; and
- Implementing a DR plan audit.
Disaster recovery plans are living documents. Involving employees -- from management to entry-level -- helps to increase the value of the plan.
Creating a disaster recovery plan
An organization can begin its DR plan with a summary of vital action steps and a list of important contacts, so the most essential information is quickly and easily accessible.
The plan should define the roles and responsibilities of disaster recovery team members and outline the criteria to launch the plan into action. The plan then specifies, in detail, the incident response and recovery activities.
Other important elements of a disaster recovery plan template include:
- Statement of intent and DR policy statement;
- Plan goals;
- Authentication tools, such as passwords;
- Geographical risks and factors;
- Tips for dealing with media;
- Financial and legal information and action steps;
- and Plan history.
Scope and objectives of DR planning
A disaster recovery plan can range in scope from basic to comprehensive. Some DRPs can be upward of 100 pages long.
Disaster recovery budgets can vary greatly and fluctuate over time. Organizations can take advantage of free resources, such as online DR plan templates from SearchDisasterRecovery or the Federal Emergency Management Agency. Several organizations, such as the Business Continuity Institute and Disaster Recovery Institute International, also provide free information and online how-to articles.
A disaster recovery plan checklist of goals includes identifying critical IT systems and networks, prioritizing the RTO, and outlining the steps needed to restart, reconfigure and recover systems and networks. The plan should at least minimize any negative effect on business operations. Employees should know basic emergency steps in the event of an unforeseen incident.
Distance is an important, but often overlooked, element of the DR planning process. A disaster recovery site that is close to the primary data center may seem ideal -- in terms of cost, convenience, bandwidth and testing -- but outages differ greatly in scope. A severe regional event can destroy the primary data center and its DR site if the two are located too close together.
Specific types of disaster recovery plans
DR plans can be specifically tailored for a given environment.
- Virtualized disaster recovery plan. Virtualization provides opportunities to implement disaster recovery in a more efficient and simpler way. A virtualized environment can spin up new virtual machine (VM) instances within minutes and provide application recovery through high availability. Testing can also be easier to achieve, but the plan must include the ability to validate that applications can be run in disaster recovery mode and returned to normal operations within the RPO and RTO.
- Network disaster recovery plan. Developing a plan for recovering a network gets more complicated as the complexity of the network increases. It is important to detail the step-by-step recovery procedure, test it properly and keep it updated. Data in this plan will be specific to the network, such as in its performance and networking staff.
- Cloud disaster recovery plan. Cloud-based disaster recovery can range from a file backup in the cloud to a complete replication. Cloud DR can be space-, time- and cost-efficient, but maintaining the disaster recovery plan requires proper management. The manager must know the location of physical and virtual servers. The plan must address security, which is a common issue in the cloud that can be alleviated through testing.
- Data center disaster recovery plan. This type of plan focuses exclusively on the data center facility and infrastructure. An operational risk assessment is a key element in data center DR planning, and it analyzes key components such as building location, power systems and protection, security and office space. The plan must address a broad range of possible scenarios.
Types of disasters
A disaster recovery plan protects an organization from both human-made and natural disasters. There is not one specific way to recover from all kinds of disasters, so a plan should tackle a range of possibilities. A natural disaster may seem unlikely, but if it can happen in the organization's location, the DR plan should address it.
According to independent consultant Edward Haletky, potential disasters to plan for include:
- Application failure
- VM failure
- Host failure
- Rack failure
- Communication failure
- Data center disaster
- Building disaster
- Campus disaster
- Citywide disaster
- Regional disaster
- National disaster
- Multinational disaster
Testing your disaster recovery plan
DR plans are substantiated through testing, which identifies deficiencies and provides opportunities to fix problems before a disaster occurs. Testing can offer proof that the plan is effective and hits RPOs and RTOs. Since IT systems and technologies are constantly changing, DR testing also helps ensure a disaster recovery plan is up to date.
Reasons given for not testing DR plans include budget restrictions, resource constraints or a lack of management approval. Disaster recovery testing takes time, resources and planning. It can also be a risk if the test involves using live data.
DR testing can be simple to complex. In a plan review, a detailed discussion of the disaster recovery plan looks for missing elements and inconsistencies. In a tabletop test, participants walk through plan activities step by step to demonstrate whether disaster recovery team members know their duties in an emergency. A simulation test uses resources such as recovery sites and backup systems in what is essentially a full-scale test without an actual failover.