I would also refer you to the following: robotstxt.org's About /robots.txt
Or this other informational FAQ directly from archive.org: How can I have my site's pages excluded from the Wayback Machine?
You could always look at the Google cache for the site(s)/URL(s) in question, because it (Google's crawler) sometimes ignores the robots.txt file. You could also search with Google using the term "robots.txt disallow" sans quotes, or use that term in whatever search engine you prefer (if you eschew Google for whatever reason) to get the same info.
Regardless, thanks for the article; it is very interesting!
http://obamareleaseyourrecords.blogspot.com/2011/10/new-york-state-board-of-elections.html
New York State Board of Elections Website Blocking Access To Natural Born Citizen Requirements