Replies

Try this:

http://www.dclab.com/pdfconversion3.asp

When you scan a document into a PDF, the software can actually pull apart the image into different layers. It can recognize text, images, background, etc. and organize them into different layers. The most important aspect of this is the text layer which uses a technology called Optical Character Recognition, which tries to convert what it's scanning on the page into actual text that can then be searched. Governments and archives that have massive amounts of old paper documents (like deeds, birth certs, legal paperwork, and everything else you can imagine) that they need to be scanned into computer systems use this because if you just scan a document and save it as an image or picture file, your computer isn't smart enough to tell that it's looking at text; all it sees is a picture.

For example, if you take a picture of a stop sign and load it into your computer, you can't then search your documents and files for the word 'stop' and expect the computer to recognize it, just like if you take a picture of a printed form with your name on it and load it as a picture, you can't search your name and have the picture file show up. All the computer sees is bits of data that tell it what to display in a certain picture format. It doesn't see any difference between a printed letter 'T' and a picture of a cross. That's why when you scan a document in using the OCR, it uses an algorithm to detect text that can then be searchable.

What happened here is that whoever scanned the image of the printed document forgot to turn the OCR setting off, so when it when it scanned Obama's cert, it tried to break it out into different layers for background image, high res images, text, and so on. This setting was probably turned on at the Hawaii records office because they most likely do use OCR to scan old documents into digital formats.

The problem is that OCR isn't always accurate hence the discrepancies in the layers. You can tell that this isn't the same as Photoshop layers, where multiple images sit on top of one another, because the background image layer has white shadows for where the text has been lifted out into a different layer. The software isn't smart enough to know what is behind every typed letter or signature, so it leaves it blank, hence you have white outlines of all the text and boxes on the background layer, and the background layer itself looks like someone went through and wrote everything in white out.

Make sense?

Yes. Your explanation does make sense to me!

Thank you for your patience with this techno-tard.

(I almost said, “you lost me at, ‘When you scan a document into a PDF...’” then I continued to read and it fell together.

I still wish someone would post a demonstration for the others to see.

GunRunner, I truly respect your knowledge and I would really appreciate your opinion of this:

http://market-ticker.org/cgi-ticker/akcs-www?post=185094

If you have the time and the inclination, would you please share your thoughts?

Marie

That actually makes a LOT of sense, GR, as an explanation of why there is a bit of whitish outline around the letters. Thanks for the explanation!

Thank you for that very good explanation,

And congrats on your first decade of FReeping! I am on my second decade now too.