Posted on 11/11/2004 1:35:03 PM PST by mhking
2004-11-11
Is MSN Crawling Google? |
My Theory On This Mysterious Microsoft Crawler
The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.
First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.
Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.
So Microsoft's Screen Scraping?
Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.
Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.
Jason's article first appeared on his blog MarketingShift.com.
Just damn.
If you want on the list, FReepmail me. This IS a high-volume PING list...
Sounds just like Microsoft. I hope that no one here uses there new service.
I wonder if Google could respond by conditioning their search on the incoming requester's address. Either send garbage, or refuse and log.
Yawn.
I hope I'm not alone in saying, I have no idea what I just read.
bump
Microsoft did not write the original Dos. They purchased at some rediculous price from someone else. They got the idea for windows from someone else too. They stole most of the good features in IE from Netscape. So Microsoft is known for not having original ideas.
I doubt it. If they tried that, it would slow Google's famously-quick searches considerably, surely have some bugs and lock out a few users, and take a significant amount of resources to execute.
And, above all that, M$ could just proxy their bot, foiling the whole scheme, and both would be thrown into a game of hide-and-go-seek, which would be counterproductive for BOTH sides.
Typical M$ behavior, IMO, but, AFAIK, technically legal. Oh, well...I'm sticking with Google.
You arent. That was all greek to me..
Imagine you were going to start a websearch engine service to compete with google. And in order to create your orignal search results database...You just stole the whole database from google. Get the picture now?
Not a problem. Us computer geeks like to use a lot of pseudo-sophisticated terminology to intimidate un-computering types.
In short, the article is saying Microsoft is simply using results from Google to populate its search engine.
See post 11.
all geek, you mean
Hmmm. I AM NOT a big fan of M$ anymore. That said, I find it a little hard to swallow that M$ would crawl without masking or using a proxy ip. If they did, there was another reason for it.
:O)
P
Microsoft's MSN search engine was using Yahoo! as the underlying database. I wouldn't be surprised if Microsoft made a deal with Yahoo! to have their database included in the new MSN search engine as a starting point.
Perhaps now the MSN spider is verifying it's own database.
The first ad that appeared on my gmail account was for MSN Search.
Comparative analysis maybe, or or a way of saying "in your face".
:O)
P
Just dont say sploogle or booble to google. They dont like it:-) Dont go to those sites! They have crawlers too, but I dont think it has anything to do with the internet!
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.