Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: Kleon
(LMAO)!!! Obviously, you don't even have a clue how the OCR and PDF software basically function. The fonts being referenced are not the fonts in the document being scanned. The fonts information are in the PDF table, which takes them from a table created by the OCR software as it attempts to interpret images and make its best guess as to which alphabetic character, numeral, or symbol the image might represent. Since there are no exactly corresponding digital fonts for handwriting and even some of the old typographic characters in old documents, the OCR software creates and translates its interpretations to serif and sanserif generic fonts when needed in place of known fonts. The PDF software then imports the font table from the OCR software to represent the text, whether or not the original text was ever in such a font.

The absence of font table information indicates no OCR scan occurred, because an OCR scan creates a font table and exports it to the PDF file. The text information in the PDF came from an image scan instead of an OCR scan.

81 posted on 08/02/2011 9:55:37 PM PDT by WhiskeyX
[ Post Reply | Private Reply | To 75 | View Replies ]


To: WhiskeyX
The absence of font table information indicates no OCR scan occurred, because an OCR scan creates a font table and exports it to the PDF file.

If the desired result was to either have the text in the PDF searchable or replace them with computer fonts (in other words, a full OCR process), then this would be the case. But like I said, that wouldn't make sense with a document like this. The software here was used to detect text blocks and enhance accordingly.

111 posted on 08/03/2011 8:34:55 AM PDT by Kleon
[ Post Reply | Private Reply | To 81 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson