Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: Ha Ha Thats Very Logical
Like I said before, the 'R' wasn't read as a letter. For whatever reason--probably because it was lighter than the other letters--it was read as part of the background, like a gray smudge, and downsampled along with the rest of it.

The trouble with this theory is that it ended up in a string of text where a letter is supposed to be. If the program couldn't recognize it as a letter, why did it put it in the middle of a string of text?

226 posted on 07/20/2011 7:44:45 AM PDT by DiogenesLamp (The TAIL of Hawaiian Bureaucracy WAGS the DOG of Constitutional Law.)
[ Post Reply | Private Reply | To 198 | View Replies ]


To: DiogenesLamp
The trouble with this theory is that it ended up in a string of text where a letter is supposed to be. If the program couldn't recognize it as a letter, why did it put it in the middle of a string of text?

The point of the document archiving process is to be able to search for a document by the text within it but also retrieve an exact copy/image of the document. That's why the OCR'd text is placed in an invisible layer on the PDF--the search function can operate on the text without the text getting in the way of the exact copy of the image.

So the image would still have all the letters in their original positions--it didn't "put" the 'R' in the middle of a string of text, that's just where it was on the original form. But because it was fainter than the other letters, it was left as part of the background rather than being extracted with the rest of the text. That means it was downsampled along with the rest of the background when the PDF was optimized. (It would also mean that if the father's name wasn't there, someone searching for a document containing 'Barack' wouldn't find this, while someone searching for 'ack' would.)

273 posted on 07/20/2011 12:09:57 PM PDT by Ha Ha Thats Very Logical
[ Post Reply | Private Reply | To 226 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson