This isn't much of an explanation. The Library of Congress has 20 terabytes of archived books. The entire collection fits on 20 hard drives (i.e. less than a cubic foot of physical space).
LLS
Exactly! And since you can buy a 1 terabyte hard drive at Staples for $150, it’s not like it’s prohibitively expensive. These guys can afford to buy a new building but they supposedly can’t afford to keep their data or convert it to electronic storage media? Yeah, right!!
The problem is, if you've checked out HARRY's README file, their data management was a mess. If it's like most of the the companies I've been in, there would have been little budget spent on managing that data properly and maintaining it in modern formats.
But I also believe that some people thought that by overwriting the original data with what they honestly thought was corrected data, they were helping things by making sure people didn't go back to it by mistake. On the other hand, I think some intentionally wanted it unavailable.
They are so busted! Even their lies aren't credible.
It has been much more difficult than you apparently comprehend to maintain long term data bases. I personally have data on written journals, punched paper tape, punched cards, 9-track (~12”) mag tape, 7-track tape, three different types of data tape cartidges, 8” floppy disk, 5-1/4 floppy (2 formats), 3 1/2 floppy, 100M zip drive, 50Mb hard drive “cartridge”, dozens of Winchester hard drives ranging from 10Mb to 500Gb, CD (3 formats), DVD (2 formats), several “solid state” formats. This does not by any means exhaust the possible formats.
The lab I last worked at has a large warehouse in which data from the past 25 years is stored on at least 8 of these types of formats. The manpower has never been available to transfer the older data to newer formats. Fortunately, people seldom ask for the older datasets since the new experimental results supersede older results in my field, and they are not dependent on those older results.
I AM NOT ABSOLVING CRU for dumping the tapes and records. What they have done here is unforgivable scientifically. They were given an international trust to maintain historical records of temperature, and were funded to do this. They should never have dumped this data, because they KNEW they were manipulating it in various ways for publication. They were TASKED to maintain this data! Many countries completely turned over all sorts of historical data they had collected trusting that CRU could and would do a better job of maintaining it than they could! That trust has been violated.
The preservation of historical data is a huge huge problem in science. The formats change so rapidly. Many are fairly volatile, especially those that depend on magnetics. The databases really ought to be rewritten periodically, but manpower is seldom available.
Consider the problem that Hubble researchers have: 5years, or 50 years, or possibly even 250 years from now, there is a supernova 20,000 light years from Earth. That star had never gained attention. However, over the years, that section of sky has been imaged thousands of times. Suddenly, astronomers want to look at its history. Very important questions depend on this type of history! There have been millions and millions of terabytes of information gathered, but now we want to sort through it to look at how a few dozen pixels have changed from time to time. What format is it being kept in? How do we maintain the data in reliable archives? How do we index it and find it and read it when needed? The money has been spent to do the research over the years. Has it all been wasted? Yes- if those archives are not available. However, good choices of assembling and maintaining that data is a very tricky problem, and I don't believe that problem has been adequately solved.
Your saying “that 20Tb of data would only take up one cu.ft.” is a horrible simplification of the problems we have with data storage.
Again: CRU grievously violated a trust they were tasked with, and I'll never forgive their horrible judgment. The FACT that they manipulated that data in ways that have never been well vetted or understood by the science and mathematics community SHOULD have made saving the raw historical data infinitely more important, and GOOD SCIENTISTS in charge at CRU would have made it clear how critical it was to save it.
AS SOON as good scientists had realized that raw data was gone, they would have NOTIFIED THE WORLD it had been lost. Efforts would immediately have been made to gather as primitive data as was still available. ALL papers that depend on manipulated data would contain cautionary notes of the uncertainty built in as a result of the data loss.
The way that Bible Scholars have had to work over the years is a good example of the way GOOD scholars would work when they KNOW that the original manuscripts are not available.
Dumping the data is unforgivable, but hiding that it was dumped, and pretending that loss of the original data is unimportant and the massaged data has higher intrinsic value is even worse.
Pretty damned amazing, when you think of it.