Replies

He is specifically speaking about the robots.txt file with the user agent disallow flag set so archiving sites (ones that actually obey the protocol) do not crawl and archive the site/domain/URL in question. This is not a robot, it is a message to the crawling bot to ignore the site (not all bots will obey).
See the following: Block or remove pages using a robots.txt file

I would also refer you to the following: robotstxt.org's About /robots.txt
Or this other informational FAQ directly from archive.org: How can I have my site's pages excluded from the Wayback Machine?

You could always look at the Google cache for the site(s)/URL(s) in question, because it (Google's crawler) sometimes ignores the robots.txt file. You could also search with Google using the term "robots.txt disallow" sans quotes, or use that term in whatever search engine you prefer (if you eschew Google for whatever reason) to get the same info.

Regardless, thanks for the article; it is very interesting!

http://obamareleaseyourrecords.blogspot.com/2011/10/new-york-state-board-of-elections.html

New York State Board of Elections Website Blocking Access To Natural Born Citizen Requirements