Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: ml/nj

There are certain things machines do well. Performing many calculations very quickly is one, and is why computers are so good as playing chess. Distinguishing text from non-text, in context, is not. OCR technology is pretty good, and getting better, but it’s not perfect.

I deal with OCR’d documents every day (I’m an attorney, a number of my clients have scanned many of their contracts and saved them as PDFs). While the technology works pretty well, if I need to find particular language in a contract, a search of OCR’d text is no substitute for a pair of human eyes on the document.


74 posted on 05/11/2011 8:31:29 AM PDT by Conscience of a Conservative
[ Post Reply | Private Reply | To 72 | View Replies ]


To: Conscience of a Conservative
Distinguishing text from non-text, in context, is not. OCR technology is pretty good, and getting better, but it’s not perfect. I deal with OCR’d documents every day (I’m an attorney, a number of my clients have scanned many of their contracts and saved them as PDFs).

Of course, the purpose of the pdf was to be a portable document format suitable for transfer to a printer. The OCR thing is a plus that came later. And just as with early chess programs that couldn't beat an average player, they aren't perfect.

In the case of the White House pdf, I have focused now on the signatures which are clearly non-text; and obviously (to me anyway) have different sources, pdf software or no.

I would be curious though about your pdf documents that you see in your legal work. I assume you produce some and you receive others. And here I'm only concerned with the ones that shouldn't be composites. Do you ever see any where even the text and the signatures have different pixelation? (Let alone two signatures next to each other!) And do you know anything about how the pdfs you produce become pdfs; or do you just scan and a pdf shows up on your computer? (As opposed say to obtaining a jpg from a scan and then utilizing Acrobat include the jpg image in a pdf?)

ML/NJ

82 posted on 05/11/2011 10:37:19 AM PDT by ml/nj
[ Post Reply | Private Reply | To 74 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson