Replies

"Maybe I am just missing something here, but why even use OCR?"

Any office that takes in multiple letters and docs a day, scans them in with some form of OCR so they're searchable.

Yes, but why would the WH scan his BC using OCR? All they have to do is post an image of it. Makes no sense to to me, they just wanted to put up the image on the web for all to see.

Do people here actually understand the OCR process? Because it sure doesn’t seem they do.

OCR = Optical Character Recognition

It is the process of scanning an image (either an image file like a bmp or jpg, or a piece of paper with text on it) looking for patterns that match letters of the alphabet.

The result is a SEPERATE FILE. That seperate file might be a text file or a stream of data that feed to another application. The main point to get from this is: the original file isn’t affected in any way, shape or form.

It would be like someone taking a picture of your car (from 20 feet away); you later noticing a dent in your car door; and someone saying “that was probably a result of the photographic process.” No it wasn’t! Photographing a car won’t change a car in any way, and OCRing a document won’t change the document in any way.

Are people confusing OCRing and scanning? Scanning is taking a piece of paper and reading it into electonic format IN A SINGLE OPERATION. The result of scanning a document is a single image file (a bmp, jpg, etc...) It will not have parts or sections, but will a single, undivided, unsectioned stream of bits.

A PDF file is just a container designed to contain and display pieces of data in a predefined manner. Think of it like an envelope. You put things into the envelope (image files, text, fonts, etc...), arrange them in a specific way, and then show them to people. From the outside of the envelope it looks like a single piece, but a PDF is actually an envelope that contains arranged pieces.

When people opened the Obama birth certificate PDF “envelope” and looked inside, one of the things they should have found inside was an image file (bmp, jpg, etc...)—a single image file—that was the result of the original document scan. Remember, the result of scanning a document into a computer is a SINGLE image file (bmp, jpg, etc...)

That’s what this PDF envelope should have contained—a single image file; a single “layer” of data. Instead it contained multiple image files (multiple “layers”) that were arranged to give the illusion of a single image file.

That is what should be sending up red flags.

>>”Maybe I am just missing something here, but why even use OCR?”
>
>Any office that takes in multiple letters and docs a day, scans them in with some form of OCR so they’re searchable.

And yet this document was not “taken in” but “put out.”
Furthermore, there is no reason to make it searchable because all the [printed] text is that of a standard form (and OCR doesn’t do that well against handwriting, especially when it’s “untrained” on the writer).