Why can’t they import Word or PDF files instead of scanning hard copies?
Rhetorical question.
I do not expect you to have an answer.
In our situation, the documents were so old and were not in pdf. They were scanned to pdf. Not to be confused with pdf native documents, of course.
Hi Pluribus - yes you’re correct: DOCx or PDF files can be imported directly into text. And both are usually searchable as individual files (PDF’s might not be).
But sometimes those docs may contain true handwriting.
Imagine for instance an affadavit containing the scan of a hand-marked up work order for the repair of a Georgia toilet. That might need to be scanned for text, or it be ok to leave it as a scan, depending on an organisations workflow.
Case in point. I once had the job of creating a database using text that I OCR-ed from hand-written work-orders scanned to doc files.
These work-orders detailed railway incidents and remedial work on those incidents. The orders obviously had massive potential significance due to legal liability. My work made them searchable.
Data collection tools are far more digitised these days, but not everybody can wave a tablet or a phone and gather all they need from a crash, a bridge-strike location, the site of a leak etc.
Handwritten forms are still a big deal. I can certainly imagine a large organisation bound by Government regulations retaining a safe catch-all way of doing things. At least as a fall-back.
Schools, Hospitals, Nuclear facilities, Courts - the risk of missing a reference by not OCR-ing everything might be enormous.
And you always have the clean PDFs to work from if you need to. I guess that OCR-ed text is for searching on, not for presenting to a Judge.