Replies

That is pretty much exactly what Leo found. As far as I know, it is an accurate description. If Justia goes to remove itself off of InternetArchive.org, that would be an admission of guilt. And believe me, we are watching the archive to see if Justia puts up mor of them on the pages we have published today, and the ones Donofrio has published at his site: http://naturalborncitizen.wordpress.com/2011/10/20/justia-com-surgically-removed-minor-v-happersett-from-25-supreme-court-opinions-in-run-up-to-08-election/

He is specifically speaking about the robots.txt file with the user agent disallow flag set so archiving sites (ones that actually obey the protocol) do not crawl and archive the site/domain/URL in question. This is not a robot, it is a message to the crawling bot to ignore the site (not all bots will obey).
See the following: Block or remove pages using a robots.txt file

I would also refer you to the following: robotstxt.org's About /robots.txt
Or this other informational FAQ directly from archive.org: How can I have my site's pages excluded from the Wayback Machine?

You could always look at the Google cache for the site(s)/URL(s) in question, because it (Google's crawler) sometimes ignores the robots.txt file. You could also search with Google using the term "robots.txt disallow" sans quotes, or use that term in whatever search engine you prefer (if you eschew Google for whatever reason) to get the same info.

Regardless, thanks for the article; it is very interesting!