Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

To: Brookhaven
The text displayed by PDF files is text. A PDF is not a graphic file. The text in the typical PDF file is quite easy to search. By default, all PDF files are searchable.

That's wrong. You said yourself in #52 that "The result of scanning a document is a single image file (a bmp, jpg, etc...)," and in #55 that "Scanning is the process of turning a printed document into an IMAGE FILE." If you save that scan as a PDF (yes, you can save an image file as a PDF), you get a PDF graphic file. If there's text on the page you scanned, you just get a picture of the text in PDF format, but the computer doesn't know it's text--it's just pixels. Text in a PDF file generated by a computer program, like Adobe InDesign or Microsoft Word, is readable and searchable because the computer "knows" it's text to begin with. Text on a scanned document is not--unless you perform OCR on it.

Commercial scanners don't scan and OCR in one operation....OCRing of the document is done--by a separate computer--on the image after it has been scanned into the system.

I think you're behind the times. From an Adobe blog entry from November 2009:

If you have a paper document, that you need to scan and also make searchable, you can use Acrobat to do both in a single step. Go to Taskbar Create >> PDF from Scanner and choose any of the 3 document presets (Black & White Document, Grayscale Document, Color Document). These 3 Presets have OCR option enabled by default so you can get a fully searchable scanned PDF in a single click.
And I posted a description from another software vendor above.
78 posted on 05/04/2011 9:46:48 PM PDT by Ha Ha Thats Very Logical
[ Post Reply | Private Reply | To 67 | View Replies ]


To: Ha Ha Thats Very Logical

hahaha why does a BC need to be made searchable? Just show it to us.


79 posted on 05/05/2011 5:50:00 AM PDT by JohnnyP
[ Post Reply | Private Reply | To 78 | View Replies ]

To: Ha Ha Thats Very Logical
If you save that scan as a PDF (yes, you can save an image file as a PDF), you get a PDF graphic file

Which is these is most like a PDF:


(1) An graphic file (myfile.tiff or myfile.jpg)
(2) A Microsoft Word document (myfile.doc)

The answer is 2--a Microsoft Word document. Hence the name Portable Document Format.

The only reason people think of them as different is most people have software on their desktop to open, edit, and save Word files, while they don't have the software to open, edit, and save PDF files. But with the right software, a PDF can be opened, edited, and saved just like a Word file.

A PDF graphic file? Think about what you just said in the context of a Microsoft Word file. A Word graphic file? There is no such thing. What there is is a Word file that has a graphic file embedded into it. And if you open that Word document you can access the actual graphic contained inside the word file.

In the same way, there is no such thing as a Portable Document Format (PDF) graphic file. What you have is a PDF format file that [b]contains[/b] a graphic image--it's a wrapper around the graphic. Just like the Word file is a wrapper around the graphic.

I imagine what this company does is it opens up the PDF, looks for any image files embedded inside the PDF, OCRs the embedded image file, then adds the text found on the embedded image as actual (invisible) text to the PDF. Same process as you might do with a Word file to allow you to "search" on the graphic image inside a word file. But think abou it: why would you ever want to do this process with a Word file?

As far as the Adobe process you quoted, the result of that is going to be a PDF that contains (1) an image file--a single image file, and (2) text--not comptuer gibberish--actual text.

81 posted on 05/05/2011 6:54:54 AM PDT by Brookhaven (Moderates = non-thinkers)
[ Post Reply | Private Reply | To 78 | View Replies ]

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson