We, like hopefully most businesses, have our data stored on disaster-proof hard drives as well as replicated out to our Dallas office (where I safely rode out the storm). For businesses in the greater Houston area, that "P" part of Disaster Recovery Planning is now being tested. So much work is spent on getting the data out, and even doing testing, but often DR tests are a "pass/fail" sort of occurrence. It looks like Ike is going to be a longer-term situation.
We have all heard the stats of businesses that go out of business after a major disaster. Many times, while the disaster shuts the doors, it's the prolonged time away that puts the lock on the doors. So after the obvious first step of getting the data out of the facility, one that I and countless others have written about, comes living with the DR sites capabilities.
Having the data safe is only half the battle. Doing something with that data, and for a prolonged period of time, is the other half. Here are some challenges of users that I am talking to.
First, a common mistake is not getting the data far enough away or not having your DR location far enough away. Right now, much of Houston is without power, and except for facilities that have generators, it's possible that both your primary and your DR site may be down.
Second, many businesses replicated to a cheaper, possibly less reliable and lower performing, storage system. There is serious concern about running mission-critical apps on these systems for the next few weeks. When you are designing your disaster recovery environment, make sure you have storage in the remote location suitable to sustain the performance and reliability that your applications demand.
Third, some customers "over-virtualized" their DR site. One of server virtualization's big appeals is to be able to put less hardware in the DR site by virtualizing much of the server environment in the DR location. Is the virtual infrastructure able to sustain operations for a few weeks or will that environment be over-burdened? This morning, customers are scrambling to order additional hardware to spread out this load. As part of your DR testing, make sure you don't overburden the DR servers and make sure you know who to call to quickly bring up another virtualization host and distribute that load.
Fourth, there are many instances of no backup at the DR site! The DR site just became production for many businesses and when it did that, it now needs protection. There are many sites running right now without the safety net of a backup. As part of your planning, make sure you keep the backup capability current with the needs of the DR site. Also make sure you have the ability to replicate data out of your DR site to another site, just in case you're unlucky enough to have a problem at your DR site, too.
One of the critical elements in disaster planning is to plan for a long-term activation of your DR site. Always factor in at least four weeks of remote operation and make sure your DR site is designed to handle that workload.
The final step is planning for failback. One of the most challenging steps, and typically the least planned or tested, is the failback, the process of returning production to its original location after a disaster. We'll discuss that next entry.
Track us on Twitter: http://twitter.com/storageswiss.
Subscribe to our RSS feed.
George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.