Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Microsoft Crawling Google Results For New Search Engine?
WebProNews ^ | 11.11.04

Posted on 11/11/2004 1:35:03 PM PST by mhking

Microsoft Crawling Google Results For New Search Engine?


Jason Dowdell | Contributing Writer

2004-11-11



I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.

MSN Spiders
Is MSN Crawling Google?

Is Microsoft "using" Google's search results to populate their index? Discuss Microsoft's behavior at WebProWorld.

The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

Here's the kicker

So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.



My Theory On This Mysterious Microsoft Crawler

The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

So Microsoft's Screen Scraping?

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.

Jason's article first appeared on his blog MarketingShift.com.


TOPICS: Business/Economy; Culture/Society; News/Current Events
KEYWORDS: google; internetexploiter; microsnot; underweartootight
Navigation: use the links below to view more comments.
first previous 1-20 ... 61-8081-100101-120 ... 181-200 next last
To: proust
"And it doesn't make the 100 million he gave to the UN vanish either"
Gates has done more to stop malaria n Africa than the entire EU.
As at today, he is busy funding the development of a malaria vaccine.
I don't know anyone who has done more to improve the lives of people in third world countries than gates has.
81 posted on 11/11/2004 3:02:49 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 72 | View Replies]

To: Darksheare

Quite annoying, isn't he?

Still, ZOT I do believe is unnecessary.

He'll learn, or leave.


82 posted on 11/11/2004 3:02:56 PM PST by K1avg
[ Post Reply | Private Reply | To 78 | View Replies]

To: KwasiOwusu

Still distributing your histrionics, I see.

Please note Post #73.


83 posted on 11/11/2004 3:03:35 PM PST by K1avg
[ Post Reply | Private Reply | To 77 | View Replies]

To: Knitebane; KwasiOwusu

He has no intention of answering something he knows he cannot.


84 posted on 11/11/2004 3:06:11 PM PST by Darksheare (Personality shattered and horribly twisted, the humor flows out through the cracks.)
[ Post Reply | Private Reply | To 79 | View Replies]

To: KwasiOwusu
Gates has done more to stop malaria n Africa than the entire EU.

And more to promote abortion in Africa and India than the entire EU.

And more to remove the rights of gun owners.

And require government funded health care.

Ad nauseum.

But supporting one good thing apparently makes the dozens of evil things just peachy with you, eh?

85 posted on 11/11/2004 3:06:17 PM PST by Knitebane
[ Post Reply | Private Reply | To 81 | View Replies]

To: Knitebane
"I notice he hasn't answered this question"

I don't even have to.
Thats not what this debate is about.
Its about if the original poster has bothered to check up his unfunded rumours woth the guys at Microsoft research, who are responsible for developing Microsoft search technology.
What is at issue is if enough care has been taken to in fact check up the facts on this story, before posting FUD on an Internet board.
Anyone can post anything they like.
That doesn't make it fact, does it?
86 posted on 11/11/2004 3:06:32 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 79 | View Replies]

To: mhking
Said in my best little-kid voice...

Microsoft is a copy cat!
Microsoft is a copy cat!
Microsoft is a copy cat!

87 posted on 11/11/2004 3:06:36 PM PST by MCH
[ Post Reply | Private Reply | To 1 | View Replies]

To: reagandemo
"Microsoft did not write the original Dos. They purchased at some rediculous price from someone else." True but did you know that when Bill Gates sold it to IBM he did not own it?

Microsoft bought DOS from Seattle Dos for $50,000,

They didn't sell dos to IBM, they licensed it and received royalties from every IBM pc sold because it came with PC-DOS (IBM's modification of MS-DOS)
88 posted on 11/11/2004 3:07:07 PM PST by jimthewiz (California conservative in a bright red county)
[ Post Reply | Private Reply | To 25 | View Replies]

To: Darksheare; KwasiOwusu
Yeah, I think he saw through your set-up.

You're gonna need some better troll bait, I'm thinking.

We've got us a well-trained troll here. He's been taught not to let the discussion revolve around facts and technical analysis. That's a sure fire way for him to get spanked quickly.

89 posted on 11/11/2004 3:09:27 PM PST by Knitebane
[ Post Reply | Private Reply | To 84 | View Replies]

To: All

Does any one know who is scanning the Domain name search stream and registering popular domain names before the searcher can?


90 posted on 11/11/2004 3:10:02 PM PST by granite (THIS IS NOT A DRILL. THE NEXT FOUR YEARS START TODAY.)
[ Post Reply | Private Reply | To 88 | View Replies]

To: jimthewiz
And they licensed it to IBM before they bought it from Seattle Computing.

Don't they call that larceny?

91 posted on 11/11/2004 3:11:08 PM PST by Knitebane
[ Post Reply | Private Reply | To 88 | View Replies]

To: Knitebane
"And more to promote abortion in Africa and India than the entire EU"

No he hasn't.
Show me one country in Africa that gates has gone in there and promoted abortion.
I can give you at least 5 countries off the top of my head of countries Gates or his dad have visited to set up clinics for treatment d AIDS nd malaria.
Mozambique, Botswana and South Africa, Ghana and Nigeria come to mind.
On the other hand, the EU is the place where they support abortion like nowhere else.
92 posted on 11/11/2004 3:11:35 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 85 | View Replies]

To: K1avg

Very annoying.
He will learn, as you said, one way or the other.


93 posted on 11/11/2004 3:12:06 PM PST by Darksheare (Personality shattered and horribly twisted, the humor flows out through the cracks.)
[ Post Reply | Private Reply | To 82 | View Replies]

To: KwasiOwusu
"Thats not what this debate is about. Its about if the original poster has bothered to check up his unfunded rumours woth the guys at Microsoft research, who are responsible for developing stealing for Microsoft search technology."

There. Fixed it for ya.

94 posted on 11/11/2004 3:13:04 PM PST by Knitebane
[ Post Reply | Private Reply | To 86 | View Replies]

To: Knitebane
"He's been taught not to let the discussion revolve around facts and technical analysis"

You have not posted a single fact or "technical analysis"
Nor have you answered my questions about the authenticity of this piece of FUD.
95 posted on 11/11/2004 3:13:50 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 89 | View Replies]

To: KwasiOwusu

Yes, it IS teh point of teh debate.
IP addies are registered to people, or corporations.
A web admin can see who is scanning (pinging) them by their ip addy.
If there is an addy registered to Microsoft scanning you, you will be able to tell by IP lookup.
A Google admin noticing they are being sniffed repeatedly by a Microsoft business IP would be kinda suspicious.
Got it yet?


96 posted on 11/11/2004 3:14:34 PM PST by Darksheare (Personality shattered and horribly twisted, the humor flows out through the cracks.)
[ Post Reply | Private Reply | To 86 | View Replies]

To: KwasiOwusu

Ah, yes, but saying Mr. Gates is less liberal than Google doesn't quite disprove it, now does it? In fact, if the debate's about your supposed "FUD," then why precisely have you spent the majority of your posts defending your assertion that Gates is better politically to us FR types?


97 posted on 11/11/2004 3:15:12 PM PST by K1avg
[ Post Reply | Private Reply | To 86 | View Replies]

To: Knitebane; KwasiOwusu

I just tried to explain it to him as simply as possible.
We'll see what quaddity troll we have here.


98 posted on 11/11/2004 3:15:23 PM PST by Darksheare (Personality shattered and horribly twisted, the humor flows out through the cracks.)
[ Post Reply | Private Reply | To 89 | View Replies]

To: KwasiOwusu

And you state this right after saying the question to you about what you know on IP addies, ping/ack sync packets has no bearing?
You really cannot possibly be that blind, can you?


99 posted on 11/11/2004 3:16:29 PM PST by Darksheare (Personality shattered and horribly twisted, the humor flows out through the cracks.)
[ Post Reply | Private Reply | To 95 | View Replies]

To: KwasiOwusu
You have not posted a single fact or "technical analysis"

Right-oh, Captain Barnabus. Unless, of course, you count the original article. The facts are right there, and the logical conclusion is drawn out. You've failed to disprove said logical conclusion.

And calling it "FUD" a million times doesn't help your case. IF I wanted to hear BS like that over and over, I could've gone to DU.

100 posted on 11/11/2004 3:17:34 PM PST by K1avg
[ Post Reply | Private Reply | To 95 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-20 ... 61-8081-100101-120 ... 181-200 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson