informa
/
Database Security
Commentary

Vibrations Part II

In my last entry we opened up a can of worms around drive vibration, discussing what it is and how it occurs. Vibration exists, but why should you, the IT professional, care? This stuff is all on RAID 5, right? Why do you care if a drive fails?
In my last entry we opened up a can of worms around drive vibration, discussing what it is and how it occurs. Vibration exists, but why should you, the IT professional, care? This stuff is all on RAID 5, right? Why do you care if a drive fails?As I said in the previous entry, up to 70% of drives that are returned to the drive manufacturers are fine. Many can be powered back on and, with no additional effort, pass all diagnostic tests. In short, it was a false failure. Many of the remaining "failed" drives can simply be reconditioned by recalibrating the heads, rewriting the servo tracks and performing a low-level format of the drive. This will allow them to pass all diagnostics.

What do these false failures or recoverable failures cost you? Time and possibly data. You have to pull the drive, put a new drive in, rebuild the RAID setup, pack up the drive, and either give it to the supplier's engineer or send it back to the manufacturer. It's an annoyance and just takes time away from IT professionals who are already stretched too thin as it is. Especially when all this time is spent for something that in 70% of the instances should have never occurred in the first place!

Back to vibration. What are the effects of vibration?

First, it causes heads to move horizontally off track or to vertically fly too high or too low. This results in read and write errors, which then cause excessive retires, slowing data access, and can even lead to data loss because the outbound data is unreadable or the inbound data unwriteable.

This vibration can then cause early drive failure because of excessive read and write errors. In truth, the problem may not be with that drive at all. One of the adjacent drives in a drive bay may actually be creating enough vibration to cause failures in neighboring drives. Ever have a drive slot that seems to have recurring failures? It may not be that bay or that slot, it may be the adjacent drive.

The bigger fear is that the drive with excessive vibration does not fail right away and causes the two adjacent drives on each side to fail, thereby causing a complete RAID failure but a golden opportunity to test how well your backup is working.

I have even seen a case where this problem has caused a triple drive failure, breaking even a RAID 6 protection scheme. Drive 4 was overvibrating, drive 3 failed, during the rebuild drive 5 failed, and then, as you might have guessed, drive 4 failed. Ironically, all three drives, when sent back to the manufacturer, were perfectly fine.

Most array companies are not doing much to solve vibration issues, assuming that it's easier to just replace the falsely failed units than have to redesign their architectures. A few companies seem to be out in front on this issue. In our third entry on the subject, we will cover some techniques that companies like Xiotech and Copan Systems are using to reduce drive vibration in their specific environments.

George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.

Recommended Reading:
Editors' Choice
Kirsten Powell, Senior Manager for Security & Risk Management at Adobe
Joshua Goldfarb, Director of Product Management at F5