Posted on 07/05/2007 9:30:29 AM PDT by ShadowAce
The growing problem of accessing old digital file formats is a "ticking time bomb", the chief executive of the UK National Archives has warned.
Natalie Ceeney said society faced the possibility of "losing years of critical knowledge" because modern PCs could not always open old file formats.
She was speaking at the launch of a partnership with Microsoft to ensure the Archives could read old formats.
Microsoft's UK head Gordon Frazer warned of a looming "digital dark age".
Costly deal
He added: "Unless more work is done to ensure legacy file formats can be read and edited in the future, we face a digital dark hole."
Research by the British Library suggests Europe loses 3bn euros each year in business value because of issues around digital preservation.
The National Archives, which holds 900 years of written material, has more than 580 terabytes of data - the equivalent of 580,000 encyclopaedias - in older file formats that are no longer commercially available.
Ms Ceeney said: "If you put paper on shelves, it's pretty certain it is going to be there in a hundred years.
"If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it."
"Digital information is in fact inherently far more ephemeral than paper," warned Ms Ceeney.
She added: "The pace of software and hardware developments means we are living in the world of a ticking time bomb when it comes to digital preservation.
"We cannot afford to let digital assets being created today disappear. We need to make information created in the digital age to be as resilient as paper."
But Ms Ceeney said some digital documents held by the National Archives had already been lost forever because the programs which could read them no longer existed.
"We are starting to find an awful lot of cases of what has been lost. What we have got to make sure is that it doesn't get any worse."
The root cause of the problem is the range of proprietorial file formats which proliferated during the early digital revolution.
Technology companies, such as Microsoft, used file formats which were not only incompatible with pieces of software from rival firms, but also between different iterations of the same program.
Mr Frazer said Microsoft had shifted its position on file formats.
"Historically within the IT industry, the prevailing trend was for proprietary file formats. We have worked very hard to embrace open standards, specifically in the area of file formats."
Costly deal
Microsoft has developed a new document file format, called Open XML, which is used to save files from programs such Word, Excel and Powerpoint.
Mr Frazer said: "It's an open international standard under independent control. These are no longer under control of Microsoft and are free for access by all."
But some critics question Microsoft's approach and ask why the firm has created its own new standard, rather than adopting a rival system, called the Open Document Format.
Instead, Microsoft has released a tool which can translate between the two formats.
Ben Laurie, director of the Open Rights Group, said: "This is a well-known, standard Microsoft move.
"Microsoft likes lock-ins. Typically what happens is that you end up with two or three standards."
The agreement between the National Archives and Microsoft centres on the use of virtualisation.
The archive will be able to read older file formats in the format they were originally saved by running emulated versions of the older Windows operating systems on modern PCs.
For example, if a Word document was saved using Office 97 under Windows 95, then the National Archives will be able to open that document by emulating the older operating system and software on a modern machine.
Ms Ceeney said the issue of older file formats was a bigger problem than reading outdated forms of media, such as floppy discs of various sizes and punch cards.
"The media it is stored in is not relevant. Back-up is important, but back-up is not preservation."
Adam Farquhar, head of e-architecture at the British Library, praised Microsoft for its adoption of more open standards.
He said: "Microsoft has taken tremendous strides forward in addressing this problem. There has been a sea change in attitude."
He warned that the issue of digital preservation did not just affect National Archives and libraries.
"It's everybody - from small businesses to university research groups and authors and scientists.
"It's a huge challenge for anyone who keeps digital information for more than 15 years because you are talking about five different technology generations."
The British Library and National Archives are members of the Planets project which brings together European National Libraries and Archives and technology companies to address the issue of digital preservation.
He said that open file formats were an important step but there was still work to be done.
"Automation is a key area to work on. We need to be able to convert hundreds and even thousands of documents at a time," he said.
That's true, but I'm more worried about the media side. The media needs to hold up physically, and there needs to be continued availability of hardware that can read it. You can address both concerns by copying all your important stuff onto new, current media from time to time.
I'm less concerned about file formats. They are pure information. And when the time comes, chances are someone will have written the necessary conversion. When Captain Kirk is born on 22 March 2233, they will still be able to read PDFs.
To take an extreme example, taxcontrol advocates imaging your hard drive periodically, thereby simultaneously backing up your data and the applications to read it. Now, when Captain Kirk arrives, chances are there won't be any PCs capable of booting up your drive image. But there will almost certainly be virtual machine applications capable of simulating a PC on the computers of 2233. They'll just need to be able read the drive image off the physical media.
“Well, theres always .txt.”
Is that EBCDIC, UTF-8, ASCII, ISO 8859-1, Big-5 or...
And what will people do in 2050 if vi is deprecated in 2036?
2 words:
Taiyo Yuden
Stop buying those crapass Memorex CDs and you’ll have far less coasters...not all CDs/DVDs are created equal (and I have a special hatred for memorex after 1 wasted evening)
I've been transferring all our old home movies from VHS to DVD. The VHS copies were degrading, with lots of dropouts. I was hoping the DVDs would be more robust, but now I see they're not.
So what is a good medium for preserving something as precious as videos of my kids growing up?
Mark
Maybe there is an online data storage service?
Nope, and I use it on Win32 systems too!
Mark
It’s about time our National Archives started to realize this problem, much of which is tied (as you know) to MicroSoft’s proprietary formatting of data.
Thank goodness for OpenOffice and at least a few institutions recognizing the importance of data being in a completely open format.
Ektachrome slides are all red. I had that issue as well. Kodachrome slides from the ‘50’s held up pretty well.
Oh: on second read - this is the British National Archives, and they’re singing the praises of M$ - who have CAUSED a great deal of the problems of data by hiding their format “secrets”. There’s a long way to go yet... and again, I’m glad Open Office has released a good standard format, and that many data firms worked with them to develop something that was good.
Maybe they were taken in Pleasantville?
I have read all the comments by various posters and want to make some comments to everyone in general. I am likewise more concerned about the longevity of the storage media and the working hardware being available to read it than I am about file format compatibility. I expect some media to outlast others. I suggest a parallel backup strategy using several media types to hedge your bets. For instance, DVD-RAM and Magneto-Optical don’t use dyes and may last longer than CDs and DVDs, provided the plastic substrate does not crack from age (as I have already seen with CDs). The downside is that those technogies and their drives are proprietary. There are also archival-grade CDs and DVDs available. Mothballing new hardware for future access may not ultimately work because electronics that are not used seem to fail sooner than electronics that are left on. Why? Some electronic components, such as electrolytic capacitors (found on many circuit boards) go bad much sooner if not used.
The attached drives are cheap and are much more durable than DVD’s. It will also be a lot easier to copy all of you movies for your kids from a single drive rather than dozens of DVD’s particularly when you can’t remember where you left or loaned out some of those DVDs.
I particularly like the 2.5 inch attached drives as the good ones are small enough to comfortably fit in a coat pocket and don’t require an external power source other than the USB cable. You can easily take them on a trip to share movies with out an out of town family member and get a copy of their media files while you are at it.
Ditto on both points. I have 16mm Kodachrome movie film of my father as a kid dating back to its first year of production in 1935 -- the color appears to me to be a good as it likely ever was.
BUMP!
DVDs are much more robust than VHS. Every time you play a VHS tape, it comes in contact with the read heads and the image is degraded.
You are hearing horror stories in this thread, but CDs and DVDs are currently the best archival storage solution for an ordinary consumer.
For something really important like your children's videos, transfer those to DVD and use a very good brand. Make two or three copies (e.g. one for use, one for a backup at home and another for an off-sight backup in case of fire).
Every year or two, make some additional backup copies on new media just to be safe.
It is especially important to get film photographs transferred to CD and the sooner the better while the negatives are not degraded.
Flash drives and memory cards are not currently long term storage solutions. Disk arrays (RAID) are not backup solutions either - they are availability solutions.
My brother had a lot of sermons, correspondence on Epson(I think)word processor disks.
Haven’t found a way to access them and transfer the text to a more compatible format.
You should probably consider transferring the data off of the zip disks. Zip disks did suffer from the "click of death" - this is when the user hears a clicking sound which heralds the very sudden death of the zip disk.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.