Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: Marie
Try this:

http://www.dclab.com/pdfconversion3.asp

When you scan a document into a PDF, the software can actually pull apart the image into different layers. It can recognize text, images, background, etc. and organize them into different layers. The most important aspect of this is the text layer which uses a technology called Optical Character Recognition, which tries to convert what it's scanning on the page into actual text that can then be searched. Governments and archives that have massive amounts of old paper documents (like deeds, birth certs, legal paperwork, and everything else you can imagine) that they need to be scanned into computer systems use this because if you just scan a document and save it as an image or picture file, your computer isn't smart enough to tell that it's looking at text; all it sees is a picture.

For example, if you take a picture of a stop sign and load it into your computer, you can't then search your documents and files for the word 'stop' and expect the computer to recognize it, just like if you take a picture of a printed form with your name on it and load it as a picture, you can't search your name and have the picture file show up. All the computer sees is bits of data that tell it what to display in a certain picture format. It doesn't see any difference between a printed letter 'T' and a picture of a cross. That's why when you scan a document in using the OCR, it uses an algorithm to detect text that can then be searchable.

What happened here is that whoever scanned the image of the printed document forgot to turn the OCR setting off, so when it when it scanned Obama's cert, it tried to break it out into different layers for background image, high res images, text, and so on. This setting was probably turned on at the Hawaii records office because they most likely do use OCR to scan old documents into digital formats.

The problem is that OCR isn't always accurate hence the discrepancies in the layers. You can tell that this isn't the same as Photoshop layers, where multiple images sit on top of one another, because the background image layer has white shadows for where the text has been lifted out into a different layer. The software isn't smart enough to know what is behind every typed letter or signature, so it leaves it blank, hence you have white outlines of all the text and boxes on the background layer, and the background layer itself looks like someone went through and wrote everything in white out.

Make sense?

260 posted on 04/27/2011 3:57:47 PM PDT by GunRunner (10 Years of Freeping...)
[ Post Reply | Private Reply | To 255 | View Replies ]


To: GunRunner

Yes. Your explanation does make sense to me!

Thank you for your patience with this techno-tard.

(I almost said, “you lost me at, ‘When you scan a document into a PDF...’” then I continued to read and it fell together.

I still wish someone would post a demonstration for the others to see.


265 posted on 04/27/2011 4:26:11 PM PDT by Marie (Obama seems to think that Jerusalem has been the capital of Israel since Camp David, not King David)
[ Post Reply | Private Reply | To 260 | View Replies ]

To: GunRunner

GunRunner, I truly respect your knowledge and I would really appreciate your opinion of this:

http://market-ticker.org/cgi-ticker/akcs-www?post=185094

If you have the time and the inclination, would you please share your thoughts?

Marie


266 posted on 04/27/2011 4:30:39 PM PDT by Marie (Obama seems to think that Jerusalem has been the capital of Israel since Camp David, not King David)
[ Post Reply | Private Reply | To 260 | View Replies ]

To: GunRunner

That actually makes a LOT of sense, GR, as an explanation of why there is a bit of whitish outline around the letters. Thanks for the explanation!


268 posted on 04/27/2011 4:47:49 PM PDT by Jeff Winston
[ Post Reply | Private Reply | To 260 | View Replies ]

To: GunRunner

Thank you for that very good explanation,

And congrats on your first decade of FReeping! I am on my second decade now too.


286 posted on 04/27/2011 9:22:39 PM PDT by Yaelle
[ Post Reply | Private Reply | To 260 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson