Saturday, November 7, 2009

The Sun Sets on Disaster Recovery (Finally)

Big changes have to come in small doses. For those of us fortunate to have several years experience with SOA and utility computing we see so many of the great things cloud can do and how it really addresses so many of the complexities within IT. I often explain part of the tremendous value of Cloud Computing is trapping complexity within layers of abstraction so we don't expose limitations. However I see one of the biggest killer apps for Cloud Computing, business continuity, as not only trapping disaster recovery within the infrastructure layer but doing away with disaster reactivity entirely!

First not every failure is a disaster. Failures can and do occur and we need to be smarter about how we engineer our solutions. Our focus should be on automated recovery; an option which becomes a real solution in a cloud world. If a service dies another service should be started. If hardware fails jobs should move to alternate hardware. Creating heat maps for failover can go a long way to identifying and targeting areas where failure recovery needs to be addressed and hopefully automated. But a disaster is a large scale failure for which we so often employ a different set of tools. Why? Primarily because of our legacy silo approach. If one silo dies we need to move data and jobs to an alternate silo. In a cloud architecture we don't have silos (even in a virtualized architecture the silo is logical rather than physical manifestation). So?

Once we architect our solutions to be service oriented and distributed from the start we lessen the impact of all failures from the simple to the theatrical. If we lose a data center that's bad. However if our solution is already load balancing across data centers, and we ensure by business rule we always have services available in each center, then our exposure is limited to in-flight transactions and sessions. If we have a true cloud infrastructure then we should already have the network bandwidth required to perform database mirroring in which case the disruption of the data center loss is as minimal as we have the ability to make it.
Further advances will come to light in the next few years as databases tackle the federation issue and learn to manage data in logical instances rather than physical domains.

None of this happens, however, without intent.

We need a business case. What is the lost business opportunity per hour of downtime. For many business critical systems this value is calculable, and I argue if it’s not then there’s no reason for recovery. Our solution cost needs to be a small percentage of that potential loss. Today disaster recovery is EXPENSIVE: hot-sites and recovery contracts, tapes to retrieve from an off-site location and restore, staff to move around, periodic tests which always end with multiple failures. According to the Symantec 5th Annual IT Disaster Recovery Survey in June 2009, the average annual budget for disaster recovery is $50M. Consider that against the cost of 50TB of storage on Amazon E3: $90k. WOW! So in one fell swoop we can improve recovery speed and accuracy and reduce cost by elminating tapes, backups, tape recoveries, off-site storage, and the administration costs. And it only gets better!

Moving into the cloud we take advantage of all the tools and capabilities that already exist from service directories and virtual machines to provisioning and orchestration engines, schedulers and service level managers. We move out of Disaster Recovery and into Business Continuity. The focus shifts from recovering business systems based on a Recovery Time Objective and Recovery Point Objective to providing near seamless continuity via recovering services and virtual machines and cloudbursting to get needed resources. Is the cloud ready today? It's pretty darn close. Consider that Oracle's ERP solution will backup and recover from cloud storage. According to Symantec's survey three key hurdles in the virtualized world are storage management tools to protect data and applications, resource constraints which challenge the backing up of virtual environments, and that today 1/3 of organizations don't backup virtual environments.
Today or tomorrow business continuity brought about by cloud concepts is on the horizon and is a target every should be shooting for. It saves money and time and reduces risk. What's not to love?


