As drive capacities increase the time it takes to rebuild a RAID set after a drive failure can now take days in some cases. The chances of a second or even third drive failing during that rebuild process also increases. There is also the impact on performance during the rebuild process. The more you allocate storage processing toward the rebuild effort the faster the rebuild occurs but the slower the application performs. If you allocate more processing toward the application the rebuild process slows down and you are exposed to additional drive failures for a longer period of time.
As we discuss in our recent article "What's Missing From Your Disaster Recovery Plan?" application or operating system clusters often won't help much here. Most rely on shared storage. If that storage fails there is a chance that your application cluster just failed along with it. Most operating system level clustering technologies won't detect specific application failure nor will they monitor performance conditions.
There are a few ways to protect your application from its storage. The first is a better storage system with multiple, more than two, controllers that are resilient to a storage software failure, meaning you can roll a storage software upgrade to each processor. There is also a growing number of backup applications that allow data to be served from the backup device. The third option is to use failover applications that can make sure that application data is being written to two separate storage systems at the same time. The use of software would allow the deployment of a more mid-range storage solution to support an enterprise class storage system. Most of these software solutions will work across applications and not require special versions of operating systems. Some are even application aware, so they can detect an in-application failure or performance degradation.
Armed with this level of resiliency, applications can now be kept available even if the worst case local disaster occurs, a storage system failure. Too often we focus on getting data out of the data center, when in reality the data center is fine. It's these inside the data center failures that really get you into trouble, a software based tool is something to look into to make those troubles go away.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.