With shrinking IT budgets you will at some point be tempted fill your RAID controllers with cheap desktop hard drives. You may also be tempted to replace your servers system drives with them as well. I'm going to give you some reasons why you should consider buying those pricey enterprise class drives instead.
This "feature" is the one that made me want to write this whole entry. Where I used to work I heard there was a group having issues with hard drives in their RAID arrays dropping out for no reason and then re-appearing. Come to find out they are using desktop class hard drives with their RAID controllers. With enough research they found out why this was happening.
Error Recovery Control or Command Completion Time Limit or Time-Limited Error Recovery. Every manufacture seems to have a different name for this but it's all the same idea.
Modern hard drives have internal error recovery algorithms that can take upwards of a minute to recover and re-map data that the drive can not easily read. Many RAID controllers (depending on the controller) will drop a non-responsive drive in ~ 8 or so seconds. This can cause the array to drop a good drive because it has not been given enough time to complete its internal error recovery procedure. This leaves the rest of the array vulnerable. Many enterprise class drives limit this error recovery time and prevent this problem.
Enterprise (sometimes called "RAID CLASS") hard drives have this value set around ~ 7-8 seconds. After that period if the disk has not recovered from the error it will issue an error message to the RAID card and defer the error recovery until a later time. This will let the RAID card decide on how to handle the recovery issue.
Desktop drives have this feature turned off as it is assumed that there is no RAID card. During the error recovery process the disk becomes non-responsive and it will not issue any type of error message. That is the point when the RAID card marks the drive as bad and removes it from the array. Desktop hard drives are built with the assumption that they should do everything possible to complete the error correction. They assume there is no RAID controller there to help with error recovery.
All hard drives vibrate during normal spinning and seek operations, but a lot of vibration can cause real issues. When you pack 8,12,16,24 or more drives close together into a case then add lots of high-speed fans to cool them off your not doing your hard drives any favors when it comes to vibration. On top of that you probably have a bunch of machines bolted to a rack vibrating each other. Not to mention the vibration from the HVAC systems.
All this vibration can disrupt the operation of a drive.
Excess vibration can lower drive performance by shaking the drive head off it's intended path. The disrupted drive must then wait for its head to move back into position before resuming normal operation. It can also cause a larger number of logged medium errors and increase the frequency of drives being marked offline by the RAID controller. Also, don't be alarmed if you see a higher rate of data corruption in high vibration environments.
Don't believe vibration effects hard drives that much? Check out this You Tube video. It's amusing and informative all at the same time.
Most enterprise class RAID drives combat vibration-induced performance degradation with sensors that allow the drives to tolerate a larger rotational vibration window. These sensors allow the drives to compensate for the rotational vibrations by using a closed feedback loop between the head and the spindle. It can sense vibration anomalies and adjust the drive head accordingly.
Enterprise class drives usually have more servo wedges in the disks tracks compared to desktop drives. These wedges are used determine the location of the head in relation to the track. If a head misalignment is detected it will hold the read or write and wait for the target location to come under the head again. Enterprise drives use dedicated servo and data path processors along with servo algorithms in the drives firmware to help with this compensation.
Desktop class drives don't have to deal with as many vibration issues which means they don't need sophisticated mechanisms to compensate for vibration induced errors. Desktop class drives usually have less servo wedges and only one combined servo and data path processor with no firmware compensation algorithms. This means they are more susceptible to rotational vibration errors.
Without these vibration sensors, extra servo wedges, dedicated servo and data path processors, and compensation algorithms you will likely see symptoms of vibration related errors including lower drive performance, a larger number of medium errors, and an increased frequency of drives marked offline by the RAID controller.
Enterprise-class drives usually implement some type of "end-to-end" error detection in their design. Data that is transmitted from one end of the drive to the other with this system would be accompanied by some type of parity or checksum at every stage. This will allow for data transmission errors to be detected, and in some cases corrected or retransmitted.
Desktop systems will have some error detection in their subsystems, but they do not usually provide this type of end-to-end data protection. They would not incorporate things like Error Correction Code (ECC) in system memory or drive memory buffers. Enterprise class drives will use error detection at every stage of data transmission within the system.
Enterprise class hard drives designed to run 24/7/365. Desktop drives are not. When the manufactures make the drives they take this into consideration. Enterprise workloads create greater wear on bearings, motors, actuators, etc. This generates additional heat and vibration. Enterprise class drives are designed with heavy duty components and drive firmware programming to meet the rigors of this environment.
Examples of higher quality would be things like enterprise drives having the spindle anchored at both ends while desktop drives do not. Head stack assembly's in enterprise drives have a higher structural rigidity and lower inertial design compared to the desktop drives which have a lighter weight design and a higher inertial designed head stack assembly. Enterprise drives have larger magnets and air turbulence controls while desktop drives have smaller magnets and no air turbulence compensation in relation to the actuator mechanics.
Why do you think those desktop drives cost so much less than the enterprise class ones? Have a look at the mean time between failures (MTBF) for a enterprise class drive. It is usually above 1 million hours. Desktop drives are in the hundreds of thousands of hours. Does this mean the enterprise class drive will last longer? No! But it does mean that the odds are higher that it will.
You can use the above info as a list of features to look for in a good enterprise class RAID drive, but it's more of an informative warning. Enterprise class hard drives cost more for a reason. They are used in different environments than desktop machines. I know this sounds like an big ad for the hard drive manufactures but spend the money and get the enterprise class (or "RAID CLASS") drives for your RAID arrays and system drives. If your data is important enough to RAID then it's important enough to buy the right drives.