Posted on 08/04/2021 3:34:04 AM PDT by CptnObvious
The Dirty Ticking Time Bomb of Computing is About to Come Out!
The Disk Manufacturers all know it. The Operating System producers have all known it. But for decades the Dirty Ticking Time Bomb of Computing is about to come out.
It does not matter whether you are on a Laptop, Desktop, Smartphone or even a mainframe, the Ticking Time Bomb is on your system and you probably never knew it.
It is under the heading of SPARES. Specifically when your drive runs out of them.
And a couple of areas on your system disk. Sector 0. If sector 0 is bad, typically it's all over on the next boot. KAPOW!!
Your Smartphone is DEAD AND GONE. Your $1000 apple watch is Kaput. The Data on your PC system disk is GONE and not recoverable.
In a Nutshell, the Time Bomb is WHEN, NOT IF, YOU RUN OUT OF "SPARES" and the next SECTOR IS CRITICAL. KABOOM!!
It does not matter what type of disk it is. Whether it is a Hard Drive, Solid State Drive, or even the new NVMe Drives. THE TICKING TIME BOMB IS THERE.
For when the Spares are gone, God only knows what will happen.
And what the industry is praying for, is that you keep buying the new stuff so that the old stuff, which is wearing out, is gone in time before the TIME BOMBS start Going Off.
BUT THERE IS GOOD NEWS:
Much of the Solid State Drive industry has standardized the reporting of Spare Usage so that the operating systems can know before hand that the problem is approaching.
And the Operating System Vendors are getting ready to be able to watch out for the Time Bombs.
For instance, a Windows 10 major update 21H2, scheduled for the October/November 2021 time frame. Has a feature which can report the "Lifecycle %" left on an SSD. And I'm guessing they are referring to, is the Spares left on the Drive.
From there, it is only one step, to give a warning message "2% Lifecycle left on C:" or something like that.
Disk drives, in general, have been so good that few have experienced THE BOMB. And a Few folks have used RAID on their system volumes to protect themselves.
But this has been a silent killer, nobody wants to talk about. But is finally being addressed.
We need to do something pronto!
Bravo!
But but but I wrote so many interesting essays (at least to me) on my CompuServe account that are trapped in an old computer I can’t get started again. The humanity!
Looks like it hit your brain before it hit your hard drive.
Disk drive makers perform these tests; instead of maintaining a bad-block list, they would perform magic to swap in "good" blocks for the "bad" blocks, using the over-provision(spare blocks) space. This testing continues as the drive is written to, so that a grown bad spot is covered by swapping as needed.
When the drive is unable to swap out bad space, the controller returns an error to the file system driver software. Sane file systems then try (and sometimes fail) to recover the data, mark the bad block as allocated permanently, then try to save the hopefully recovered data to another place, adjusting file extent lists as appropriate.
As for the partition descriptor blocks, they are rarely written, so they rarely go bad. (Unless your drive fails so badly that it "spiral writes" through the on-disk partition table. If this happens, you have more problems than just a blown partition table.)
Information on the health of the drive is provided by the SMART (Self-Monitoring, Analysis and Reporting Technology) reporting system. Most professional operating systems regularly monitor the built-in health meters to warn you when a drive is approaching critical illness.
For Solid State Drive (SSD) devices, there is another technical method for extending the life of the drive. It's called TRIM -- not an acronym -- is a cooperation between the operating system's file manager and the SSD. In some way, the coordinates of any unused space in the file system is reported to the SSD hardware. This marking is so the SSD can use the open space for swapping good blocks for bad, similar to what is done with spinning rust. In Unix, the fstab(5) entry can have the "discard" option, which performs a TRIM call for every block released. This is a Bad Idea&tm; in that it can cause more wear than it fixes. Instead, a sane OS will do a "TRIM cycle" periodically; Ubuntu 20.04 (both Desktop and Server editions) does this once a week.
SSD wear happens only on writes. So when the OS reports impending failures, stop writing to it, and perform a drive swap with a new device. Or make damn sure your backups are in pristine shape. Your choice.
This whole post is a crock. There have always been drive failures, your chosen world ending failure mode is just one of many. The simple solution has and always will be a good backup plan. Quit having public hissy fits, find a backbone, get yourself a decent backup plan and man or woman up.
I just saw this banner ad when I was searching Pinterest for quiche recipes. “Fix your hard drive by doing this one weird trick.”
I'm still waiting.
You spent a lot of time on that! But it was worth the read! :)
“” Sometimes, people with limited tech knowledge can read something and panic.””
Did the original person above used to deliver USPS, and hang out in a bar with Norm?
Reminds me of the person who took their computer tower in for repair because the ‘coffee cup-holder’ was broken. They had been using the CDROM drive to hold their coffee cup, and.... what a surprise! It broke.
Or, back in the day when a new printer came with the software and the instructions stated ‘show your computer your new printer’ So.... the lady turned the computer monitor toward the printer so the computer could ‘see’ the new printer.
There are many more of these. Like the computer with a ‘foot pedal.’ Who can tell us that one?
p
Not really. copy - paste - alter a few words ...
Ty!
Yes, all Drives have Spares. And Yes, all Hard Drives and Non-NMVe Solid State Drives have Spare Sectors to replace suspect ones. On NVMe Drives it is Spare Blocks instead.
These Spares are in reserve above the the original size of the disk so you loose none of the capacity when used.
Todays drives spare automatically, when the error in the Sector or Block (NVMe) is corrected via Firmware the data is written onto a Spare instead, and the errored Sector/Block is marked suspect; not for use anymore. The reason for this is that the next time around, if the Sector/Block is not spared, AND that Sector/Block is now UNCORRECTABLE, it may cause a critical function or program to fail when the error is reported.
On Hard Drives today, when a sector is spared, the user may see a short delay depending upon the retries allowed in firmware. There generally is no notice given to the user that this is occurring. The Spare sector comes from the SLOWEST PART of the Drive nearest the Spindle.
On SSDs/NVMe drives, the sparing occurs virtuously instantaneously. With no delay to be seen or reporting to the user. The spares are as good as any other sector/block on the drive speed wise.
Scheduled Defragging (for hard drives), reallocates the data on the spare sector used, back to a faster part of the Hard drive along with the data on the rest of the file and sets the spare sector used to the end of freespace. Thus helping the Hard drive to work better.
The Spare sector table and it's count, on a Hard Drive is generally known as a G-list.
https://www.dataclinic.co.uk/hard-drive-defects-table/
BUT SCHEDULED DEFRAGGING and WINDOWS FILE INDEXING ARE VICIOUS ENEMIES OF SOLID STATE C: and busy NVMe Drives. TURN THESE OFF for SSDs and NVMe Drives NOW!!
SSDs and NVMe drives depend on Non-Volatile Memory which can be written to, only so many times before it starts to error. I've heard this to be about 66 thousand times at the worst.
To fight the low writing capacity, SSD/NVMe drives use Smart Write caching and sneaky write distribution methods, to reduce the impact of excessive writes on these drives.
On NVMe drives, the spare table may be known as as the "Available Spare Threshold" count and table: https://advdownload.advantech.com/productfile/Downloadfile3/1-1YX8KBB/SQFlash%20SMART%20ID%20Definition(NVMe)_v1.1_20200922.pdf"
Available Spare Threshold: When the Available Spare falls below the threshold indicated in this field, an asynchronous event completion may occur. The value is indicated as a normalized percentage (0% to 100%).
When the "G-list" for hard drive and Non-NVMe drives run out, THAT'S when KABLAM can happen. Or the "Available Spares Threshold" count is ignored and goes to Zero...
And the Majority of you are Right. Most wise businesses use RAIDs or spare backup system disks. But there is no RAID in for Smartphones/Smart Watches/Ipads and most general PC users as most do not know alone, or why one would want it.
Yes the Dirty Ticking Time Bombs are out there. The good news is that the drive manufacturers are starting to report the Spares data in a way that can be meaningful and eventually useful to Operating systems to catching the TIME BOMBS before they Go OFF.
For as far as I know, the only way, today, to see the G-Lists and AVAILABLE SPARES list is with manufacturers software. Any of you knowing Operating Systems that can see these and do anything with the data now, PLEASE LET ME KNOW.
Few know about spares and generally what they know about them is far from reality.
Thanks.
A very Old Computer Tech that has waited for this day for a long time.
Anything you know how to reduce the writes, especially on the Primary System Volume would be greatly appreciated.
I do a lot of Video processing these days and am writing to the system volume in a lot of ways, I didn't expect. Even if I write directly to a USB stick, it still writes to the system volume set first, a lot. As looking at the target file being 0 bytes till the processing is at a certain point or done.
And while this is being done, the USB stick and SSD C: drive gets HOT! I do have another system where C: is an NVMe drive and I worry about that one getting HOT As WELL.
My Dell Laptop had a nasty SSD on it's M.2 Slot. Even though it had a heat sink, the heat caused it to SLOW DOWN A LOT. I replaced it with a Nice Samsung NVMe and WOW, what a difference. So you see why I want 21H2 and the SMART data, TEMPERATURE AND LIFECYCLE data can be seen. As long as the SPARES used is ZERO, I can keep from worrying.
Thanks, CO
Funny!!
Back up your hard drive.
I use this for the copy/paste: FreeFileSync to backup files.
On the left side you Compare and on the right where your files will be backed up you Synchronize Mirror
I also have a for sale one that makes a backup automatically once a week called: EaseUS Todo Backup Home
I lost one of those. In all fairness, I didn't drop it more than a couple of times.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.