Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: PA-RIVER
thought OCR was designed to translate graphic text into computer formatted letters. If OCR was used and letters were recognized, wouldn't the letters be perfect shapes?

My first knowledge of OCR was that it's purpose is to convert image text into ASCII (American Standard Code for information Interchange.) In other words, 7 bit binary representations of Alphanumeric Characters. This allowed the text information to be stored using much less space, and to be easily manipulated for content.

That it should have evolved other uses is plausible to me, though I don't see too much benefit in using it that way. One argument that I have heard is that some software will scan a document converting image data into Ascii, (thereby creating a "Text File") and then reconstitute an image file by finding the best image representations of the original data to create new letters on the output document using these created on the fly image tokens.

While I understand completely HOW this can be done, I haven't the slightest clue as to why anyone would want to do it, but as some software developers have peculiar notions about ways of doing things, it is plausible to me that someone might have actually come up with such a goofy system.

To Answer your question, Yeah, they ought to be perfect shapes if that is indeed what is happening. That is the big hole in the "Somehow the software created artifacts" theory.

It just now occurred to me that an application where you might want to create both manipuable text AND re-use the image components of the data used to create the text file is if you had a continuing need to modify existing documents to make it appear that the modified document was typed that way originally.

In other words, Say you had an old document that you needed to change something on yet make it still look like an old document. You could scan the image file, create a lookup table of Image files that correspond to the letters of the Alphabet, type in your modifications to the text, and have the document reprinted with the changes reproduced using the original images of the characters from your scan.

This would be a perfect tool for producing images of realistic looking modifications of old documents. Now who would have a use for such a software product? *I* *KNOW* !!!! Someone who has to create replacement birth certificates after an Adoption! The Department of Health in virtually every state could use a product like this.

One hitch might be that you can't find all the letters (both upper and lower case) of the Alphabet on the document you want to modify.(Leaving holes in your character replacement lookup table.) You'll have to get missing letters from somewhere else. Perhaps another document of a similar time period? Another hitch is that when you remove letters from an image file, or add extra letters from it, you have to overwrite existing letters or spaces on the original image file. Suppose you replace a Capitol "A" with a little "e". They aren't the same size, so you will have to "white out" an "A" sized space, to put your "e" over it, or you will have an "A" and an "e" superimposed over each other. This would tend to leave a "White Halo" around the Characters. It would probably just be best to "white out" all the space you need, and then just put the replacement image data over the "white space."

Hmmm. Just speculation, but it all seems to hang together pretty well. :)

175 posted on 07/19/2011 2:22:53 PM PDT by DiogenesLamp (The TAIL of Hawaiian Bureaucracy WAGS the DOG of Constitutional Law.)
[ Post Reply | Private Reply | To 157 | View Replies ]


To: DiogenesLamp
My first knowledge of OCR was that it's purpose is to convert image text into ASCII (American Standard Code for information Interchange.) In other words, 7 bit binary representations of Alphanumeric Characters.

But the purpose of this was to reproduce the text only. (Example here - I believe that the OCR-ing at the link was my work.) It IN NO WAY represents itself as a copy of the original. OCR-ing in Adobe, whose entire business is based accurate graphical representation, is for searchablity. And not even a single character is searchable in the WH PDF. There is no evidence, NONE, that that document was ever OCR-ed.

ML/NJ

227 posted on 07/20/2011 7:45:45 AM PDT by ml/nj
[ Post Reply | Private Reply | To 175 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson