Posted on 07/12/2012 10:34:47 AM PDT by Ernest_at_the_Beach
I'm going to let the cat out of the bag right here and now. Everyone's home RAID is likely an accident waiting to happen. If you're using regular consumer drives in a large array, there are some very simple (and likely) scenarios that can cause it to completely fail. I'm guilty of operating under this same false hope - I have an 8-drive array of 3TB WD Caviar Greens in a RAID-5. For those uninitiated, RAID-5 is where one drive worth of capacity is volunteered for use as parity data, which is distributed amongst all drives in the array. This trick allows for no data loss in the case where a single drive fails. The RAID controller can simply figure out the missing data by running the extra parity through the same formula that created it. This is called redundancy, but I propose that it's not.
Continue on for our full review of the solution to this not-yet-fully-described problem!
Since I'm also guilty here with my huge array of Caviar Greens, let me also say that every few weeks I have a batch job that reads *all* data from that array. Why on earth would I need to occasionally and repeatedly read 21TB of data from something that should already be super reliable? Here's the failure scenario for what might happen to me if I didn't:
At this point the way forward varies from controller to controller, but the long and short of it is that the data is at extreme risk of loss. There are ways to get it all back (most likely without that one bad sector on drive 3), but none of them are particularly easy. Now you may be asking yourself how enterprises run huge RAIDs and don't see this sort of problem? The answer is Time Limited Error Recovery - where the hard drive assumes it is part of an array, assumes there is redundancy, and is not afraid to quickly tell the host controller that it just can't complete the current I/O request. Here's how that scenario would have played out if the drives implemented some form of TLER:
The above scenario is what would play out with an Areca RAID controller (I've verified this personally). Other controllers may behave differently. A controller unable to do a bad sector remap might have just marked drive 1 as bad, but the key is that the rebuild would be much less likely to fail as drive 3 would not drop completely offline once the controller ran into the additional bad sector. The moral of this story is that typical consumer grade drives have data error timeouts that are far longer than the drive offline timeout of typical RAID controllers, and without some form of TLER, two bad sectors (totaling 1024 bytes) is all that's required to put multiple terabytes of data in grave danger.
The Solution:
The solution should be simple - just get some drives with TLER. The problem is that until now those were prohibitively expensive. Enterprise drives have all sorts of added features like accelerometers and pressure sensors to compensate for sliding in and out of a server rack while operating, as well as dealing with rapid pressure changes that take place when the server room door opens and the forced air circulation takes a quick detour. Those features just aren't needed in that home NAS sitting on your bookshelf. What *is* needed is a Caviar Green with TLER, and Western Digital aims to deliver that, among other things:
Thought this might be useful....learning something about RAID.
WD Greens are not intended to be used as NAS devices. Despite their price, I opted for some higher-priced Seagates with TLER. I do not, however, have an 8-disk NAS.
No dasd protection is completely fail-safe.
How about a RAID array built with drives all from the same lot number. Time goes by and the array enters its MTBF window. My, oh my.
I stopped reading at this point. With prices what they are today, anyone who puts 3 TB SATA drives into a Raid 5 array is foolish.
The whole point to having a RAID array is to have redundancy. With Raid 5, if one drive fails, you're OK. Two drives, and you've lost all of the data (I've seen it happen. Not often, but enough).
The problem with using RAID5 with such large disks is the rebuild time. Depending on the controller, and the disk utilization (how hard are you making it work), it can take days for an array made of of 3TB disks to rebuild itself. And, during that time, you're at risk of losing everything.
So, what do I do? Well, if I was on a budget, I'd use Raid 6. That uses two disks for parity. And, I'd get regular backups.
If I had money to burn, I'd use smaller, faster, more reliable drives. Still would get backups, though. :-)
I am no computer expert but just lost an HP motherboard along with it’s primary HD - TOTAL failure of both, and suspected it was caused bu a WD (Western) 1tb external that had been giving me “do you want to scan and fix...?” errors. I lost a good deal of data at a very busy time. (We always fail to backup when we are busiest right?) In any case - share your thoughts please - but the guy said Hitachi internals and ASUS motherboards were the only ones worth having and HP stuff is crap - from temperature problems to just not being able to handle externals. I had a new computer made and am almost finished licking my wounds. I must admit I don’t understand everything in this post but thought my recent experience might help FRiends or encourage you all to give me your thoughts and advice.
I do this every day. We had one problem, a long time ago, where EMC got a bad batch of drives from the manufacturer (Seagate? don't remember). I couldn't keep drives in the #$@##%$@% array, to save my life. Fortunately, everything was still under warranty as replacements were a couple of grand each.
But, so long as you don't go cheap, the problem with drives from the same lot isn't as bad as you'd think. Most non-consumer-level equipment has a 5 year warranty on it, as well, so anything you might lose on burn-in will be covered.
It's just a matter of being smart (see my comments on R5 vs R6 above) and allocating your disks correctly.
I've not had good luck with HP. I use Dell, because I like their support. As a business customer, I only talk to Americans (usually in TX or OK).
And....I just had a thought. Was the author talking about having a 21 TB (configured) system for his home usage? Who the heck needs that much space? You can fit (for instance) several hundred movies on a 1 TB drive. What would you do with 21x that much?
Well, if Mr. Malventano would be smart then he would use Seagate drives. Western Digital drives failed for me too many times. I’d never again use a WD drive. They are trash and fail regularly.

"RAID!!!!!"
“HP stuff is crap -”
Just got my main production desktop back working. 4 year old HP would die after random 20 minutes. Did virus checks. Replaced power supply. Cloned now screwed up Western hard drive, got it back working, but still shutdown errors. Ended up either MB or processor chip faults. I feel your pain.
Shocking. A whole article on the mistake of not selecting TLER drives -or turning TLER on via a free, down-loadable program. And WD Caviar Greens? Why not the WD RE model which is made for RAID and enterprise storage i.e., valuable data?
All the more disturbing when one realizes this is a known deficiency in the Caviar drives, that it’s mentioned all over Newegg reviews and that WD has a free, downloaded TLER control program to turn the feature on or off.
More depressing is that the author chose RAID 5 for archiving vice RAID 10 or several RAID 1s. RAID 5 = a quickly fragmented journal which = degraded performance and even data loss if not regularly defragged.
RAID 10s or RAID 1s is really the best way to go imo.
The guy was cheap in the wrong way. He should have gone with a motherboard which has an ICH10R Southbridge for integrated RAID 1, 5, 10, etc. Then if something blows or he wants to upgrade he just gets another ICH10R Southbridge board and plunks the drives in and boots up.
Although I’m using Samsung SpinPoint F3s my next disks will likely be Caviar Blacks with the TLER disabled. The Blacks use dual processors for added performance and reliability. They’re not that more expensive than the greens either.
Western Digital, Seagate, Hitachi (and before them, IBM), have had white papers on their web sites warning people to NEVER use consumer grade hard drives in RAID arrays for many years.
These were especially geared towards people who would build up home brewed servers for small businesses.
I can’t begin to count how many failures of this sort I’ve seen over the years.
Mark
I have a 2TB WD drive that I back up my application data at 5:00 everyday. It saved my butt just last week when I lost the HD on my Dell Laptop.
While I still had to load all the OS, and application stuff, the backup of my data worked perfectly.
Now if I could only get a good low/no cost full hard drive mirror on the WD drive. DO NOT USE the WD software for this as it would have taken over a month to back up everything on my laptop. There is a serious defect in their software.
Right now SyncToy handles all my data backup but the OS is still at risk.
I am not all that enamored with multiple drives of any RAID configuration as for my small business use if I have a problem with the WD drive it would just get replaced that day and the likelihood of losing another HD the same day is the stuff of tin foil hats or lighting strikes (been there done that)
1st off HP is good stuff. Those old Compaq servers and HP hard drives last forever.
I’ve had your situation before, several times. I would check BIOS settings for improper over-clocking and more likely, insufficient voltages on the CPU and/or RAM. If not over-clocked I would go in and bump up both voltages 10% to compensate for aged components.
I would then re-seat the CPU+Fan then re-seat RAM. If it still occurs I would go to 1 RAM stick and rotate through them until the fault stick was i.d. That’s about it.
HP IS crap, and in my opinion Hitachi makes the best laptop HDs. I gave up on Seagate a long time ago, and now have given up on the Western Digitals (except the RE-4s) as I’ve been seeing newer WDs failing left and right.
RAID in consumer products is an utter disaster. Much greater total probability of failing, much greater probability of losing all data, extremely difficult to recover from failure, and completely eliminates the possibility of using most hardware and software recovery tools.
Another problem with RAID5/6 for the consumer level is that it’s hard to grow. And when you replace a drive, you’re probably going to buy a significantly larger one, but what do you do with the extra space?
I’ve got more disk speed than I need, by far, so it’s RAID1 pairs for me.

I took a more expensive but safer and quicker path of protection than using a raid. I copy each drive to a dedicated backup drive using synctoy only copying the new additions (files on the original drive but not yet on the backup drive).
I could simply take the backup drive from the backup computer and use it to replace the failed drive. I will eventually buy a new drive to put in the backup machine and let my nightly synctoy backups build a new full backup drive.
The worst that can happen when any drive fails is a single drive replacement with the new drive already containing all the data. There is absolutely no danger of losing everything due to one or more drives failing.
Other advantages are:
Quicker recovery time.
No slowdown on main machine while the array is rebuilt.
Less wear and tear on all the hard drives by avoiding the multiple reads on each drive to create a new parity (replacement ) drive.
The benefits of doing it this way are:
No loss of everything due to a hard drive failure.
It all depends on the "grade" of system you're getting. There are different "lines" of computers by each manufacturer.
Right off the bat, when you're talking about servers, HP is spectacular. Their business grade systems are also very good. Their consumer grade machines are hit and miss, but in general, I've found them to be good, though not as rugged as their business class systems.
For what it's worth, I used to do warranty work on HP, IBM, and Dell servers and workstations, and laptops on the previous mentioned brands plus Toshiba laptops.
Hitachi does make awesome hard drives - they bought IBM's production facilities years ago. As with systems, there are HUGE differences in the design and reliability of commercial grade components vs consumer components. For instance, server hard drives are designed to spin at higher RPM, and can withstand much higher temperatures than the consumer grade drives. A good example is from about 8 years ago, where a manufacturer that will remain nameless mis-designed a case for airflow (there simply wasn't enough to cool the systems) and the heat was causing the drives to fail within just a few months. The manufacturer decided the fix was to replace the drives with new, server class drives that could survive the heat. Our company replaced several thousand drives for a couple of banks, a power company, and a couple of hospitals. They were (mostly) done as "pre-failure" warranty repairs.
Mark
Hitachi today is still manufacturing a line of drives with the Star suffix in it's name.
See e-mail...
WD Black (enterprise class) drives in RAID here. Is this an issue for me?
There’s another problem with RAID5 of such large drives, especially consumer drives.
First, I’ve disassembled consumer ATA drives, and I’ve disassembled enterprise SCSI drives. There really is no comparison of the construction, the latter being built like a tank to extremely tight tolerances. It took me at least ten times as long to disassemble the SCSI drive. Those drives are meant for long-term constant running, consumer drives aren’t.
Second, as drives get bigger we have a rebuild time problem, and the rebuild time with 3 TB drives can be very long in a RAID5. During that time when all your disks are thrashing as hard as they can to rebuild across 8 of them, one more dead drive means you lose your data. RAID10 takes more disks, but a rebuild is just copying data from one drive to another, and death of a drive during that means you *may*, not *will*, lose data.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.