Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

OCR + Searchability of Legal Documents.
Agere_Contra | Me

Posted on 11/26/2020 10:08:42 AM PST by agere_contra

Fellow FReepers!

Sidney Powell can spell just fine!

You are probably getting the idea by now, but it seems that ALL documents submitted to court are put through a OCR process.

OCR refers to 'Optical Character Recognition'. This is the conversion of different types of documents - scanned paper documents, digital images, PDFs and so forth - into clear text

The result is a searchable file.

This is to ensure that the text in those documents is searchable to (e.g.) research attorneys, court officials and other legal professionals.

Moreover: the data can *then* be moved to a common database and cross-referenced with other data. Anyone who has ever used a keyword search in FR will know how useful this is.

__________

I don't know for sure but I would expect that Sidney submits her various Krakens & affidavits in the form of write-protected PDFs & scans of signed documents.

Some of these file formats (PDFs for instance) could already be searchable - but the important thing for the court is to get ALL text into a common database. Searchability of individual files is not helpful - it *all* has to be machine-read.

__________

I suspect that each court Sidney submits files to carry out the following steps:

* Print out any electronic files to paper - OR alternatively they expect her to duplicate all submissions in the form of hardcopy.

* OCR all received hardcopy, and so expose all included text to the court library system.

__________

'District' vs 'Districct'

Large font titles near the begining of documents seem to get particularly hacked about by OCR. This may be because they are in a cursive font? I don't know for sure.

Hope this proves helpful.


TOPICS: Miscellaneous
KEYWORDS: court; filings; lawsuits; ocr; powell
Navigation: use the links below to view more comments.
first 1-2021-34 next last
I have to work so I can't stay and police this article.

I am NOT in anyway a court official, and I don't doubt that some FReepers actually process legal subnmissions as part of their day job.

I ask them to contribute to this thread - and to clear up any errors I've made.

Agere

1 posted on 11/26/2020 10:08:42 AM PST by agere_contra
[ Post Reply | Private Reply | View Replies]

To: agere_contra

I agree with your observations with regards to the Michigan filing and OCR causing the spelling and spacing errors. I haven’t spotted any obvious errors in the Georgia filing, but then I haven’t read it as thoroughly either.


2 posted on 11/26/2020 10:12:15 AM PST by Yo-Yo (is the /sarc tag really necessary?)
[ Post Reply | Private Reply | To 1 | View Replies]

To: agere_contra

OCR makes sense. I use it a lot to find indexed news articles from newspaper archives, and the “spelling” mistakes are often very bizarre.


3 posted on 11/26/2020 10:22:57 AM PST by PghBaldy (12/14 - 930am -rampage begins... 12/15 - 1030am - Obama's advance team scouts photo-op locations.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Yo-Yo

I hope you’re right because some of the errors are unforgivable! But that’s not important is it? What’s really important is the content that describes a deeply corrupt election system. One that precludes any confidence in the system whereby the people of these united states of America elect their representatives. From Dog catcher to President it’s completely corrupt and unreliable at each and every level! This is the kind of corruption that breeds revolution.

It cannot be left in place.


4 posted on 11/26/2020 10:28:22 AM PST by Samurai_Jack (Democrats are not the enemy, Republicans are not your friends. We're on our own folks!)
[ Post Reply | Private Reply | To 2 | View Replies]

To: agere_contra

You are mostly on target. At some point the documents are scanned and converted via OCR into a searchable text file (this is actually somewhat cumbersome and unnecessary, but change in the mechanics of the legal system is glacial, at best). Both electronic and paper documents are involved. OCR still has issues with fonts that are complex in some fashion (bold, serifs, cursive, even mixed number-letter formats).

The initial filing was likely scanned from paper into an image, then processed via OCR back into a searchable electronic format (PDF) for transmission to the court. The probably immaculate paper documents will follow along and possibly never be looked at in more than a cursory fashion until they are used for trial prep.

I used to do a lot of work with electronic documents and document automation and much of it was for legal documents (lawyers). They submitted documents electronically, but they were followed up by paper copies.


5 posted on 11/26/2020 10:33:47 AM PST by calenel (Tree of Liberty is thirsty.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Yo-Yo
I haven’t spotted any obvious errors in the Georgia filing, but then I haven’t read it as thoroughly either.

See footnote, page I. (It was before my first cup of coffee.)

6 posted on 11/26/2020 11:14:45 AM PST by frog in a pot (The American voter should realize there is nothing democratic about the current Democrat Party.)
[ Post Reply | Private Reply | To 2 | View Replies]

To: calenel
The initial filing was likely scanned from paper into an image, then processed via OCR back into a searchable electronic format (PDF) for transmission to the court.

Why paper? I'd expect the docs would go from the word processing software printed directly to pdf. Then the pdf would be uploaded. Generally, government agencies and courts accept ONLY pdf documents.

Now, I don't know if this is relevant - While some here have said that only adobe has a pdf engine, that's not true. There are others, including open source. Some of these will handle complex or unusual formatting differently.

But I will admit, the errors look like OCR errors. Assuming it was a collaborative effort, I'm not sure how the OCR came into play. Perhaps members of the team were using different software - Word for some, an Apple program for others, etc. At the last minute they may have tried to pull them together, but the conversions were not going well, so they used OCR to bring text from one document to another - copy the text you want to use, then run that through an OCR machine. I do that from time to time - I'll use OCR on some electronic text when I am trying to bring it into one of my docs. No paper is involved.

7 posted on 11/26/2020 11:18:03 AM PST by Fido969 (,i.)
[ Post Reply | Private Reply | To 5 | View Replies]

To: agere_contra
but it seems that ALL documents submitted to court are put through a OCR process.

No. They are all submitted in PDF format. Now, most of the common word processing programs out there give the option to save in PDF. If it is a PDF with fillable fields, you need to flatten the PDF before filing to make sure no one messes with the entries.

Most of the pleadings I've downloaded over the years have been pretty clean.

8 posted on 11/26/2020 11:27:00 AM PST by PAR35
[ Post Reply | Private Reply | To 1 | View Replies]

To: Fido969
I'd expect the docs would go from the word processing software printed directly to pdf.

They may have taken it out of current versions, but my old Microsoft Office Word will save directly to PDF. Or, as you suggested, you can print to PDF.

9 posted on 11/26/2020 11:30:25 AM PST by PAR35
[ Post Reply | Private Reply | To 7 | View Replies]

To: agere_contra
You are probably getting the idea by now, but it seems that ALL documents submitted to court are put through a OCR process.

Wrong.

Pleadings are drafted in a word processor such as WordPerfect. A PDF is created and saved within WordPerfect by pushing a button. No OCR of the electronic document is required to produce the OCR, it is simply a conversion process. The OCR document is submitted to the court via the electronic filing system.

Attachments to pleadings may be scanned copies of paper documents.

10 posted on 11/26/2020 11:39:15 AM PST by woodpusher
[ Post Reply | Private Reply | To 1 | View Replies]

To: PAR35
A couple of points:

1. In BOTH filings (GA and MI) errors occurred in the first page headers. That is some "coincidence".

2. pdf docs are searchable. No reason to convert to OCR for searchability.

3. A somewhat different tack - The "leak" at the counting place was an overflowing toilet on November 3 - voting day. I remember years ago the IRS got way behind on processing tax returns, so clerks would take them into the bathroom with them, and flush them down the toilet. It caused many blocked toilets at the Philadelphia IRS processing center.

You know what would also block toilets?

BALLOTS.

11 posted on 11/26/2020 11:41:39 AM PST by Fido969 (,i.)
[ Post Reply | Private Reply | To 8 | View Replies]

To: PghBaldy

OCR doesn’t work that well sometimes. Sometimes it is faster for a human being to transcribe the document.


12 posted on 11/26/2020 11:43:49 AM PST by dhs12345
[ Post Reply | Private Reply | To 3 | View Replies]

To: agere_contra

I like the theory that the errors were deliberate — to force the leftist media to cover the story. Of course, the media focuses on the non-issue of the spelling errors — but by doing so, they are actually covering the story, which they’d otherwise ignore and be trying to bury.


13 posted on 11/26/2020 11:52:53 AM PST by NewJerseyJoe (Rat mantra: "Facts are meaningless! You can use facts to prove anything that's even remotely true!")
[ Post Reply | Private Reply | To 1 | View Replies]

To: calenel

Any idea why the GA filing has no case number?


14 posted on 11/26/2020 12:20:41 PM PST by dynoman (Objectivity is the essence of intelligence. - Marilyn vos Savant)
[ Post Reply | Private Reply | To 5 | View Replies]

To: woodpusher

Can you answer post 14?


15 posted on 11/26/2020 12:23:17 PM PST by dynoman (Objectivity is the essence of intelligence. - Marilyn vos Savant)
[ Post Reply | Private Reply | To 10 | View Replies]

To: dhs12345
Sometimes it is faster for a human being to transcribe the document.

Not more than a paragraph or two - unless they type like 120 words per minute.

OCR is getting pretty good, and easy to use. I would NEVER OCR text without looking at the result carefully, though.

16 posted on 11/26/2020 12:24:33 PM PST by Fido969 (,i.)
[ Post Reply | Private Reply | To 12 | View Replies]

To: dynoman

I saw that. My guess is because it is a new case, the court assigns the number. In that case it would be left blank as a courtesy to the clerk.


17 posted on 11/26/2020 12:25:38 PM PST by Fido969 (,i.)
[ Post Reply | Private Reply | To 14 | View Replies]

To: Fido969

Some people on twitter are saying it hasn’t been filed.


18 posted on 11/26/2020 12:33:50 PM PST by dynoman (Objectivity is the essence of intelligence. - Marilyn vos Savant)
[ Post Reply | Private Reply | To 17 | View Replies]

To: dynoman

Possibly - but you wouldn’t come to that conclusion because there wasn’t a case number. How could they know the case number on new filing?


19 posted on 11/26/2020 12:45:52 PM PST by Fido969 (,i.)
[ Post Reply | Private Reply | To 18 | View Replies]

To: agere_contra

Why can’t they import Word or PDF files instead of scanning hard copies?

Rhetorical question.

I do not expect you to have an answer.


20 posted on 11/26/2020 12:56:31 PM PST by E. Pluribus Unum (You are in far more danger from an authoritarian government than you are from a seasonal virus.)
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-34 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson