Posted on 02/15/2005 7:20:13 PM PST by Brian Mosely
The robots.txt file is supposed to be a tool for keeping search engines away from directories on your web site you don't want spidered or indexed. The major search engines all claim the obey them, but warn that there may be a delay between when a robots.txt file is changed and the spider reads, and follows it. All nice and good in print, but the reality is scary.
To cut down on bandwidth use I recently listed two directories containing seldom used message boards in my robots.txt as disallowed. Almost immediately Google began hitting those directories with the fervor of a teen-age hacker. The index page alone of one received 692 hits in one day from GoogleBots.
Now add that bit of info to the recent story from Reuters about hackers discovering a wealth of information regarding things most people don't want on the internet -- at Google.com. (I mentioned it here.) Could Google be using the robots.txt files to intentionally harvest data people want hidden?
Not scary enough for you? Well, add to that the problems Michelle Malkin, Charles Johnson and other bloggers have had getting their blogs listed on Google News. Apparently Google refused to add Conservative blogs, but has no problem adding Liberal blogs such as Wonkette or the Democrat Underground.
Then it should come as no surprise that as I reported earlier today about the political contributions of Google employees.
Let's add it up: Google a blatantly Liberal entity, is found to have tons of sensitive data archived on its site, and seems to be using the robots.txt files to sniff out where that sensitive information is hidden. Why would they want it, and what do they plan to do with it? The last election was pretty dirty and stuff was being dug up left and right. Could Google be building a dirt chest of secrets to unload during the next election?
Interesting. The robots file would definitely give one a leg up in finding non-indexed content.
is there a way to spider proof web pages?
How about just having no meta tags?
Getting rid of meta tags will not keep you off the search engines. Don't know if this would be an option for you, but you could password protect your web site. That would keep everything off limits to the bots except for your login page.
Thanks for the post and data. I no longer use Google.
I have been using altavista for the past 2 days - but I don't thgink it is as easy to use a google......suggestions?
I don't think it would be too big a stretch to imagine someone getting righteously POed at this and bundling up a few little surprises for the spiders to take back to the Google servers.
I know a few people who have the skills. People I'd be afraid to have ticked off at my computers.
Google has pluses, but also some minuses -- it misses, IME, things that I get via other engines. I haven't used altavista in a while. I think I'll try it out.
I can look up posts there from years past. Heck, How To Properly Flame is still listed.
Dog pile looks like bad news. Read this:
http://securityresponse.symantec.com/avcenter/venc/data/spyware.dogpile.html
I'm no computer expert but it scares me.
Oh.
My.
Gosh!
THAT is SO funny! I don't know if I'm just in a really weird mood tonight, or what, but my sides hurt from trying to laugh without waking the rest of the family up. Especially the "Paul & the ISP"...
All your robots are belong to us.
I know plenty of smart people who are very good at what they do (mostly computer related) but have a totally cockeyed view of politics. They are mostly focused on what they do, and they have the classic knee-jerk liberal instincts when it comes to politics. It appears Google attracts a lot of such people.
I just ran a quick test. The search argument 005957160 appears on this Free Republic page from 2001. A Google search on that string turns up a valid hit on a Hillary forum, but nothing on FR. However, Clusty, Yahoo, Altavista, and Accoona all turn up the FR page. MSN draws a blank.
I also tried "site:freerepublic.com". Google claims to have about 2,450,000 pages, Clusty 87,381, Yahoo 745,000, Altavista 756,000, and Accoona 156,366.
Thank you for the heads up. I read the link. But doesn't this mean you'd have to install the toolbar for the spyware to be activated?
Yes, it does.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.