Replies

"Those numbers can't be seen unless you separate the pieces. "

Negative. If we wouldn't be able to see the denser heavier characters, which ARE separated, how is it all those lesser characters that were not processed remain perfectly visible on the background layer? For comparison check out the unprocessed AP version of the same document.

"Second, you haven't even addressed where those stray numbers come from.

Yes I did. In fact I did so first.

"You believe any of regular black letters and numbers weren't picked up by OCR but two hand written signatures where? "

I have never stated the irregularities of this document resulted from OCR. At least not a completed OCR process. Prior to OCR you'd want to clean up the doc as much as possible so the software can recognize enough pixels to select a corresponding font and make the document searchable. Something that would be SOP in any office that receives multiple documents a day. Say an Attorneys office for example.

"Those white specs if on the scanner glass would have just blocked out anything being scanned, not be scanned themselves. "

This is interesting. If those artifacts were from, say some white out specs, on the xerox at the HDOH, they would indeed have shown up as black on the safety paper printout. But if the were on the scanner at the WH, they would have been scanned as white. I just tested this. However there would be matching areas missing from the background layer.

I've found these white artifacts to be the one thing which is not yet feasibly explained.

"If the mono/color problem happened with software as you say it would not have picked up half a signature nor would it have been turned into an image file. "

Negative. Those layers are not monochrome. They are simply separate color layers -whose colors happen to be black and white- of a color image. And nefarious or not whatever happened was the result of software. I could create a file identical to the WH version in 15 minutes or so using Photoshops magic wand and a few filter passes.

Thing is any graphic artist worth his salt would not use that tool. So either this was done by a complete noob or some half-assed default scanning software selection .

OCR read problems are notoriously difficult to explain. My first use of OCR technology was in 1973 ~ which is really very early for that stuff. We even specified boxes where the respondents were to print the characters.

Well, the OCR read the documents but most of the documents had some sort of stray marks placed there by the respondents.

OCR technology is now so advanced that the USPS can read virtually 100% of all handwritten addresses, or printed addresses of any font.

I'm thinking the OCR used to originally "scan" the source documents for input into the microfiche system was a low res standard TV camera. So there'd be all sorts of "noise" in the signal that a modern software with a gazillion pictel resolution could probably pick up and toss into separate "layers" or "buckets" or whatever we wanted.

I know most folks don't think of TV as being just another OCR since it's analog, but there you have it. If all you wanted was to spit out a downstream "photostat" you'd simply record the full analog image. Digitization of that image definitely has to be closer to 2000 than to 1961 I'll guarantee.