That is pretty much exactly what Leo found. As far as I know, it is an accurate description. If Justia goes to remove itself off of InternetArchive.org, that would be an admission of guilt. And believe me, we are watching the archive to see if Justia puts up mor of them on the pages we have published today, and the ones Donofrio has published at his site: http://naturalborncitizen.wordpress.com/2011/10/20/justia-com-surgically-removed-minor-v-happersett-from-25-supreme-court-opinions-in-run-up-to-08-election/
I would also refer you to the following: robotstxt.org's About /robots.txt
Or this other informational FAQ directly from archive.org: How can I have my site's pages excluded from the Wayback Machine?
You could always look at the Google cache for the site(s)/URL(s) in question, because it (Google's crawler) sometimes ignores the robots.txt file. You could also search with Google using the term "robots.txt disallow" sans quotes, or use that term in whatever search engine you prefer (if you eschew Google for whatever reason) to get the same info.
Regardless, thanks for the article; it is very interesting!