Back in 1996, one of my jobs was to ensure backups were run on our systems. At the time, we didn’t have many servers, nor did we have much more than a few hundred gigabytes of data. Some backup software and some DLT tapes were all that we needed.
Once the digital revolution hit medical imaging, our disk consumption skyrocketed into many hundreds of terabytes. Our little backup server and its one tape drive quickly became a liability to the business: there was no way to back up that much data within a 24-hour period, and we just didn’t have the space to deploy a robotic tape array (as many similar organizations were doing at the time). Our tolerance for data loss was low, so we stood up a storage area network, boot-from-SAN capabilities, a secondary datacenter, and implemented asynchronous disk array replication and “business continuance volumes.”
In 2018, the options available for disaster recovery (DR) are greater than they were in 1996. Through the years, however, the basic queries for disaster recovery haven’t changed—what files types are being backed up, where are those backups being stored, how often are the backups occurring, and most importantly, are those backups being tested?
Engage the business
Before deciding on a solution to implement a disaster recovery strategy, it’s important to understand what the business requirements are for disaster survivability.
• What level of data loss is an organization willing to tolerate when restoring from a disaster? For lower data loss, a smaller RPO, or recovery point objective, will be needed.
• After a disaster, does the organization want to return to business operations quickly? If so, then a small RTO, or recovery time objective, will be needed. If cloud is your DR strategy, make sure to take into account latency in getting your data back on premise.
Once you’ve decided on how frequently you need to back up your systems, you need to decide what information should be backed up:
• Databases, physical servers, virtual machines, and unstructured files should all be under consideration.
• Are there different requirements for users’ files versus other unstructured data types? Your mission-critical data may be backed up hourly, but those. MP3 files sitting in a user’s home directory are far less critical, and may only be backed up weekly (if at all!).
And finally where to back up your systems:
• On-premise, if you are fortunate enough to operate a second datacenter.
• Off-premise: disaster-recovery-as-a-service, or DRaaS, providers.
• A combination of the above.
After the business requirements have been defined, there are quite a few ways to accomplish your DR strategy, no matter what combination of when/what/ where you have decided on. Solution cost, the business tolerance for downtime, frequency of file access, importance of files, and performance considerations (in the case of Internet latency) are all additional considerations when looking at DR solutions. For example, less frequently accessed files that need to be kept a long time for compliance reasons might be a good candidate for cloud archive tiers, like Amazon’s Glacier.
Let’s look at the different types of data you’ll likely to want to recover in the event of a disaster.
How much data do you have? If you have a petabyte of large-size files, and your disk consumption is increasing annually, consider storing those in the cloud. Cloud vendors have varying levels of data protection schemes (at different price points, naturally) to recover your data in the event of a disaster.
"Before deciding on a solution to implement a disaster recovery strategy, it’s important to understand what the business requirements are for disaster survivability"
If the decision is made to keep these file types on premise, array-based replication to a secondary location would suffice for DR purposes. You could also take a hybrid approach. Companies such as Cohesity will back up your data on premise as well as replicate it to a cloud target. There are disk arrays on the market today that will tier files out to the cloud, as well as software solutions that can virtualize the storage your applications write to. This last approach allows you, for example, to write a file copy to an on-premise primary disk array, and also perhaps to a lower-cost on-premise disk array for DR purposes, and yet a potential third copy to the cloud for archive purposes. You can mix and match storage vendors. Your application will not care where the file is being retrieved from, as that magic is being handled by the software.
Continuous data protection, or CDP, can be great for recovering virtual machines (VMs) from a very specific point in time. However the trade-off will be a lot of disk space used to track those down-to-the-minute, granular VM changes. Contrast this with VM snapshots, which use pointers to create a recoverable copy of your VM at the time the snapshot is taken.
These snapshots can remain local, or be replicated to a secondary site in addition to the local snapshot.
There are also solutions that will back up your VMs to the cloud; some will even allow you to spin up those VMs in the cloud from within the backup solution’s interface, should you happen to have lost a portion (or all) of your compute infrastructure.
Native MSSQL and Oracle backup tools can create performance problems for end users and are difficult to manage without having someone dedicated to them. This is especially problematic in organizations that are run 24x7 and have no concept of a backup window. The same tools I’ve seen for managing the backups of VMs also have ties into MSSQL and Oracle that are much more elegant than their native backup tools. For example, by taking a point-in-time image of the VM, and backing up the database from that snapshot.
While our organization is 99percent virtual, we still have some specialized applications that require physical servers. DR/backup software vendors that had focused on VM protection first, have added in capabilities to support physical servers. Specific use cases may allow recovery to a virtual host, however, if your physical servers have very specific hardware requirements, that may not be a possibility. In these cases, onsite spare hardware may be required, depending on how quickly you need to recover.
But whatever you do…..
Test, test, TEST your ability to restore from your backups. Restore your database backup to another machine instance and ensure it passes its consistency checks, at the least. Ensure you can hit your RTO and RPO requirements. Restore your virtual machines to a segregated network and make sure they boot up without bluescreening. If you have the infrastructure to do so, conduct a “live fire” exercise: execute your DR plan and run your systems as if you were in a disaster. When your exercise is completed, restore operations back to your primary datacenter. This type of DR exercise takes significant planning and coordination from your different business units, but gives you the benefit of knowing your entire corporate business continuity plan–beyond the technical aspects–is firing on all cylinders.
In conclusion, DR solutions on the market today have far greater capabilities than those from 20 years ago. Being able to leverage the cloud as a DR and/or backup target brings flexibility to solution choices, but you should ensure the solution you chose is designed to meet the requirements the business has set forth.