Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Microsoft Crawling Google Results For New Search Engine?
WebProNews ^ | 11.11.04

Posted on 11/11/2004 1:35:03 PM PST by mhking

Microsoft Crawling Google Results For New Search Engine?


Jason Dowdell | Contributing Writer

2004-11-11



I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.

MSN Spiders
Is MSN Crawling Google?

Is Microsoft "using" Google's search results to populate their index? Discuss Microsoft's behavior at WebProWorld.

The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

Here's the kicker

So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.



My Theory On This Mysterious Microsoft Crawler

The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

So Microsoft's Screen Scraping?

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.

Jason's article first appeared on his blog MarketingShift.com.


TOPICS: Business/Economy; Culture/Society; News/Current Events
KEYWORDS: google; internetexploiter; microsnot; underweartootight
Navigation: use the links below to view more comments.
first 1-2021-4041-6061-80 ... 181-200 next last

1 posted on 11/11/2004 1:35:04 PM PST by mhking
[ Post Reply | Private Reply | View Replies]

To: Howlin; Ed_NYC; MonroeDNA; widgysoft; Springman; Timesink; dubyaismypresident; Grani; coug97; ...

Just damn.

If you want on the list, FReepmail me. This IS a high-volume PING list...

2 posted on 11/11/2004 1:35:23 PM PST by mhking
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking

Sounds just like Microsoft. I hope that no one here uses there new service.


3 posted on 11/11/2004 1:38:32 PM PST by Revel
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking
Despicable behavior, but perhaps legal. Theft of trade secrets? Maybe not--they're publicly available.

I wonder if Google could respond by conditioning their search on the incoming requester's address. Either send garbage, or refuse and log.

4 posted on 11/11/2004 1:38:57 PM PST by Pearls Before Swine
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking

Yawn.


5 posted on 11/11/2004 1:39:32 PM PST by IDRATHERNOT
[ Post Reply | Private Reply | To 2 | View Replies]

To: mhking

I hope I'm not alone in saying, I have no idea what I just read.


6 posted on 11/11/2004 1:40:48 PM PST by el_chupacabra (I'm glad you were born.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: John Robinson; Jim Robinson

bump


7 posted on 11/11/2004 1:41:04 PM PST by JoJo Gunn (More than two lawyers in any Country constitutes a terrorist organization. ©)
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking

Microsoft did not write the original Dos. They purchased at some rediculous price from someone else. They got the idea for windows from someone else too. They stole most of the good features in IE from Netscape. So Microsoft is known for not having original ideas.


8 posted on 11/11/2004 1:41:42 PM PST by Revel
[ Post Reply | Private Reply | To 2 | View Replies]

To: Pearls Before Swine

I doubt it. If they tried that, it would slow Google's famously-quick searches considerably, surely have some bugs and lock out a few users, and take a significant amount of resources to execute.

And, above all that, M$ could just proxy their bot, foiling the whole scheme, and both would be thrown into a game of hide-and-go-seek, which would be counterproductive for BOTH sides.

Typical M$ behavior, IMO, but, AFAIK, technically legal. Oh, well...I'm sticking with Google.


9 posted on 11/11/2004 1:41:48 PM PST by K1avg
[ Post Reply | Private Reply | To 4 | View Replies]

To: el_chupacabra
I hope I'm not alone in saying, I have no idea what I just read.

You arent. That was all greek to me..

10 posted on 11/11/2004 1:43:59 PM PST by cardinal4 (W's 3.5 million pop vote isnt a mandate, but algores .5 million is??)
[ Post Reply | Private Reply | To 6 | View Replies]

To: el_chupacabra

Imagine you were going to start a websearch engine service to compete with google. And in order to create your orignal search results database...You just stole the whole database from google. Get the picture now?


11 posted on 11/11/2004 1:44:07 PM PST by Revel
[ Post Reply | Private Reply | To 6 | View Replies]

To: el_chupacabra

Not a problem. Us computer geeks like to use a lot of pseudo-sophisticated terminology to intimidate un-computering types.

In short, the article is saying Microsoft is simply using results from Google to populate its search engine.


12 posted on 11/11/2004 1:44:14 PM PST by K1avg
[ Post Reply | Private Reply | To 6 | View Replies]

To: cardinal4

See post 11.


13 posted on 11/11/2004 1:44:29 PM PST by Revel
[ Post Reply | Private Reply | To 10 | View Replies]

To: cardinal4

all geek, you mean


14 posted on 11/11/2004 1:45:02 PM PST by Kiss Me Hardy
[ Post Reply | Private Reply | To 10 | View Replies]

To: mhking

Hmmm. I AM NOT a big fan of M$ anymore. That said, I find it a little hard to swallow that M$ would crawl without masking or using a proxy ip. If they did, there was another reason for it.

:O)

P


15 posted on 11/11/2004 1:45:02 PM PST by papasmurf (Kerry..." What are you gonna' believe, me, or your own 2 eyes?"..(Groucho Marx))
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking

Microsoft's MSN search engine was using Yahoo! as the underlying database. I wouldn't be surprised if Microsoft made a deal with Yahoo! to have their database included in the new MSN search engine as a starting point.

Perhaps now the MSN spider is verifying it's own database.


16 posted on 11/11/2004 1:45:51 PM PST by Yo-Yo
[ Post Reply | Private Reply | To 1 | View Replies]

To: mhking

The first ad that appeared on my gmail account was for MSN Search.


17 posted on 11/11/2004 1:47:22 PM PST by js1138 (D*mn, I Missed!)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Yo-Yo

Comparative analysis maybe, or or a way of saying "in your face".

:O)

P


18 posted on 11/11/2004 1:48:14 PM PST by papasmurf (Kerry..." What are you gonna' believe, me, or your own 2 eyes?"..(Groucho Marx))
[ Post Reply | Private Reply | To 16 | View Replies]

To: mhking
Hey, are you a shill for Google or just an anti-Microsoft troll out to spread anti-Microsoft FUD?
Microsoft Research has more and better PHD's than Google will ever have.
Microsoft has been working on its own search engine technology for quite a long time.
Microsoft doesn't need to piggy bank off Google to build a great search engine.
In today's Wall Street Journal, Walter Mossberg did a great review on the beta of the new Microsoft search engine, and rates it better in at least 3 respects than Google, even if Google is still ahead overall for the moment.
Mossberg thinks Microsoft is set to at least give Google the fight of their lives.
Microsoft is a great American company.
I'd back Microsoft against the left wing, Bush-hating Google any day.
19 posted on 11/11/2004 1:49:57 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 1 | View Replies]

To: js1138

Just dont say sploogle or booble to google. They dont like it:-) Dont go to those sites! They have crawlers too, but I dont think it has anything to do with the internet!


20 posted on 11/11/2004 1:51:18 PM PST by BookaT (My Cat's Breath smells like Cat Food!)
[ Post Reply | Private Reply | To 17 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-4041-6061-80 ... 181-200 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson