Posted on 07/05/2007 9:30:29 AM PDT by ShadowAce
The growing problem of accessing old digital file formats is a "ticking time bomb", the chief executive of the UK National Archives has warned.
Natalie Ceeney said society faced the possibility of "losing years of critical knowledge" because modern PCs could not always open old file formats.
She was speaking at the launch of a partnership with Microsoft to ensure the Archives could read old formats.
Microsoft's UK head Gordon Frazer warned of a looming "digital dark age".
Costly deal
He added: "Unless more work is done to ensure legacy file formats can be read and edited in the future, we face a digital dark hole."
Research by the British Library suggests Europe loses 3bn euros each year in business value because of issues around digital preservation.
The National Archives, which holds 900 years of written material, has more than 580 terabytes of data - the equivalent of 580,000 encyclopaedias - in older file formats that are no longer commercially available.
Ms Ceeney said: "If you put paper on shelves, it's pretty certain it is going to be there in a hundred years.
"If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it."
"Digital information is in fact inherently far more ephemeral than paper," warned Ms Ceeney.
She added: "The pace of software and hardware developments means we are living in the world of a ticking time bomb when it comes to digital preservation.
"We cannot afford to let digital assets being created today disappear. We need to make information created in the digital age to be as resilient as paper."
But Ms Ceeney said some digital documents held by the National Archives had already been lost forever because the programs which could read them no longer existed.
"We are starting to find an awful lot of cases of what has been lost. What we have got to make sure is that it doesn't get any worse."
The root cause of the problem is the range of proprietorial file formats which proliferated during the early digital revolution.
Technology companies, such as Microsoft, used file formats which were not only incompatible with pieces of software from rival firms, but also between different iterations of the same program.
Mr Frazer said Microsoft had shifted its position on file formats.
"Historically within the IT industry, the prevailing trend was for proprietary file formats. We have worked very hard to embrace open standards, specifically in the area of file formats."
Costly deal
Microsoft has developed a new document file format, called Open XML, which is used to save files from programs such Word, Excel and Powerpoint.
Mr Frazer said: "It's an open international standard under independent control. These are no longer under control of Microsoft and are free for access by all."
But some critics question Microsoft's approach and ask why the firm has created its own new standard, rather than adopting a rival system, called the Open Document Format.
Instead, Microsoft has released a tool which can translate between the two formats.
Ben Laurie, director of the Open Rights Group, said: "This is a well-known, standard Microsoft move.
"Microsoft likes lock-ins. Typically what happens is that you end up with two or three standards."
The agreement between the National Archives and Microsoft centres on the use of virtualisation.
The archive will be able to read older file formats in the format they were originally saved by running emulated versions of the older Windows operating systems on modern PCs.
For example, if a Word document was saved using Office 97 under Windows 95, then the National Archives will be able to open that document by emulating the older operating system and software on a modern machine.
Ms Ceeney said the issue of older file formats was a bigger problem than reading outdated forms of media, such as floppy discs of various sizes and punch cards.
"The media it is stored in is not relevant. Back-up is important, but back-up is not preservation."
Adam Farquhar, head of e-architecture at the British Library, praised Microsoft for its adoption of more open standards.
He said: "Microsoft has taken tremendous strides forward in addressing this problem. There has been a sea change in attitude."
He warned that the issue of digital preservation did not just affect National Archives and libraries.
"It's everybody - from small businesses to university research groups and authors and scientists.
"It's a huge challenge for anyone who keeps digital information for more than 15 years because you are talking about five different technology generations."
The British Library and National Archives are members of the Planets project which brings together European National Libraries and Archives and technology companies to address the issue of digital preservation.
He said that open file formats were an important step but there was still work to be done.
"Automation is a key area to work on. We need to be able to convert hundreds and even thousands of documents at a time," he said.
Well, that's quite obvious. It's much easier to destroy a compact disc than to shred an entire encyclopaedia set which is equivalent to the amount of data the disc is capable of containing.
Well, there’s always .txt.
Even with this fancy formats, you can always cat them on a Unix box, or edit in vi.
If I wrote a telephone number on it - shelf life won't reach 100 seconds.
LOL! My thoughts exactly
This is a very serious issue complicated by the fact that there is a growing tenancy to require businesses to retain records for longer periods of time. This is often an issue not realized by the business until too late.
My recommendations to a business that faces a 5 year or greater records retention need is to create a “record” - actually an image of the standard operating system loaded with all of the appropriate applications necessary to read any of the file formats for that year. Then, during the annual records archival process, store that image along with the data.
This then creates an image for each year stored on a CD or DVD. But what happens for systems say 10 years down the road when CD players are no longer available? While this does tend to solve the software issues, what do you do about hardware compatibility?
I know one company that buys new PC hardware every 2 years and replaces 1/3 of their companies PCs at that time. What they do is buy an extra system and mothball it right away in their archives room. Brand new PC, sitting there doing nothing .... just for the purpose of having a working system to read old files / software programs.
We just need a format in which the document does not have tons of arcane formatting codes. The simpler the better.
With the price/gigabyte falling as fast as it has the past few years, why depend on CD? A good SAN with RAIDed drives should keep data safe. As one HDD in the SAN fails, merely replace it. The RAID system will rebuild the new drive without any fuss.
How common is record retention on CD or DVD? I thought many backups were on hard disk and/or tape.
Some programs are better than others at this. The current WordPerfect can still read early ‘80s Wordstar. Even a brand new machine can boot into DOS, and run old copies of Multimate . For the really hard up, there are conversion services. We’ve been through this before . . . punch cards, DEC tape, paper tape anybody? How about 9 track? The really important stuff has been moved onto optical disc already. You can find old CPM , Atari and Commodore stuff on FTP servers anywhere. The old Unix stuff never even went away.
Currently, we have a newly mature industry. Yeah, we have trubles reading old compugraphics diskettes, and forget those 160MB tape cartridges! However, PDF, HTML and XML are industry wide standards. PDF in particular, is not going anywhere for a long, LONG time (for one thing US Gov issues docs in that format, and the readers are free).
The same problem exists with photos. Photos taken during the Civil War are more permanent than those taken since the fifties. Inks fade and disappear whereas those old silver emulsions never fade.
I recently transferred about 800 kodachrome and ektachrome slides taken in Viet Nam to digital format. About 2/3’s of them were OK, but the remaining 1/3 were color faded in all but the reds. By faded, I mean there was no trace of any color except red.
Apparently, this woman has never heard of Sandy Berger.
Wow, another vi user!! Thought I was the last one! ;o)
Data is normally kept on tape - true. But most OS or system image software is currently designed to burn ISO images to CD or DVD.
Are zip drives now dead? I still have backups stored on zip disks. I would presume the external drives can still be found and installed on a port?
F
Have you ever had stuff on a CD or DVD?
The recording is made on a laser sensitive dye backed with aluminum. Take a CD or DVD and put a piece of tape on the label side. Leave it a day or two then remove the tape. The tape will pull off the label, and aluminum layer and the disk will be shot. A big gouge on the record side is easily buffer out and repaired, a tiny scratch on the label (like writing with a ball point pen) will render the disk un-usable. The atmosphere can slowly corrode the aluminum right through the label, rendering the disk useless in a few years. Any time you record a CD or DVD make two copies because you don’t want to depend on these disks to archive data.
vi is the only thing you can count on when you log into a strange Unix box.
If you are bought in a lot to look at other’s problems, it’s the only way to go. That, and ‘find . * | xargs grep somthing’
Nope--I'm here also.
However its much easier to have a redundant off site copy of every CD you have then every book..
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.