Free Republic
Browse · Search
Smoky Backroom
Topics · Post Article

To: Brown Deer
The Library of Congress has been digitizing old books in their collection for quite some time. Building I worked in had some of Schwartzkopf's intelligence folks "doing stuff". One of the guy's father was deeply involved in that digitizing effort.

He was always good for a tale or two about the Library of Congress. He related that the major problem was in figuring out how to READ the old stuff so that digitized text in ancient fonts or lead type of all sorts could be read.

I am not an expert in this but the basic idea is that you "read" the text in a number of places to be able to identify the entire data set of all the characters that would be found in the document.

That data set is then matched up with standard OCR programming.

You call up the ancient book, the OCR process clicks in and reads the text in the native fonts just like it was all current and up to date modern standard fonts.

They then associate the full OCR'd text with the visible text, and that enables you to quickly find material in the book while thinking you are reading it in the native printing.

The LOC was then able to do a quick and dirty on a vast number of books where a complete "read" wouldn't be processed except for commonly accessed works. If someone wanted to look at an infrequently read document then the OCR software would do the job for you.

The idea was that eventually everything would be OCRd, but all in due course, and as cheap as possible within the framework of the LOC budget.

Indiana University was in the business of digitizing a vast number of pictures ~ they had one of the world's premiere collections. Yale is doing that with their unique sets. Presumably other universities are doing the same thing.

There the problem is you need a deck of super computers on hand to handle the data stream for compression.

438 posted on 06/08/2011 6:04:24 PM PDT by muawiyah
[ Post Reply | Private Reply | To 427 | View Replies ]


To: muawiyah

http://www.youtube.com/watch?v=ExW64zOZGoI


454 posted on 06/08/2011 6:28:22 PM PDT by Brown Deer (Pray for 0bama. Psalm 109:8)
[ Post Reply | Private Reply | To 438 | View Replies ]

Free Republic
Browse · Search
Smoky Backroom
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson