Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: coon2000
"Maybe I am just missing something here, but why even use OCR?"

Any office that takes in multiple letters and docs a day, scans them in with some form of OCR so they're searchable.

29 posted on 05/04/2011 11:57:16 AM PDT by moehoward
[ Post Reply | Private Reply | To 22 | View Replies ]


To: moehoward

Yes, but why would the WH scan his BC using OCR? All they have to do is post an image of it. Makes no sense to to me, they just wanted to put up the image on the web for all to see.


51 posted on 05/04/2011 2:37:44 PM PDT by coon2000 (Give me Liberty or give me death!)
[ Post Reply | Private Reply | To 29 | View Replies ]

To: moehoward; coon2000

Do people here actually understand the OCR process? Because it sure doesn’t seem they do.

OCR = Optical Character Recognition

It is the process of scanning an image (either an image file like a bmp or jpg, or a piece of paper with text on it) looking for patterns that match letters of the alphabet.

The result is a SEPERATE FILE. That seperate file might be a text file or a stream of data that feed to another application. The main point to get from this is: the original file isn’t affected in any way, shape or form.

It would be like someone taking a picture of your car (from 20 feet away); you later noticing a dent in your car door; and someone saying “that was probably a result of the photographic process.” No it wasn’t! Photographing a car won’t change a car in any way, and OCRing a document won’t change the document in any way.

Are people confusing OCRing and scanning? Scanning is taking a piece of paper and reading it into electonic format IN A SINGLE OPERATION. The result of scanning a document is a single image file (a bmp, jpg, etc...) It will not have parts or sections, but will a single, undivided, unsectioned stream of bits.

A PDF file is just a container designed to contain and display pieces of data in a predefined manner. Think of it like an envelope. You put things into the envelope (image files, text, fonts, etc...), arrange them in a specific way, and then show them to people. From the outside of the envelope it looks like a single piece, but a PDF is actually an envelope that contains arranged pieces.

When people opened the Obama birth certificate PDF “envelope” and looked inside, one of the things they should have found inside was an image file (bmp, jpg, etc...)—a single image file—that was the result of the original document scan. Remember, the result of scanning a document into a computer is a SINGLE image file (bmp, jpg, etc...)

That’s what this PDF envelope should have contained—a single image file; a single “layer” of data. Instead it contained multiple image files (multiple “layers”) that were arranged to give the illusion of a single image file.

That is what should be sending up red flags.


52 posted on 05/04/2011 2:44:42 PM PDT by Brookhaven (Moderates = non-thinkers)
[ Post Reply | Private Reply | To 29 | View Replies ]

To: moehoward

>>”Maybe I am just missing something here, but why even use OCR?”
>
>Any office that takes in multiple letters and docs a day, scans them in with some form of OCR so they’re searchable.

And yet this document was not “taken in” but “put out.”
Furthermore, there is no reason to make it searchable because all the [printed] text is that of a standard form (and OCR doesn’t do that well against handwriting, especially when it’s “untrained” on the writer).


90 posted on 05/05/2011 9:19:53 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)
[ Post Reply | Private Reply | To 29 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson