Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Microsoft Crawling Google Results For New Search Engine?
WebProNews ^ | 11.11.04

Posted on 11/11/2004 1:35:03 PM PST by mhking

Microsoft Crawling Google Results For New Search Engine?


Jason Dowdell | Contributing Writer

2004-11-11



I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.

MSN Spiders
Is MSN Crawling Google?

Is Microsoft "using" Google's search results to populate their index? Discuss Microsoft's behavior at WebProWorld.

The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

Here's the kicker

So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.



My Theory On This Mysterious Microsoft Crawler

The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

So Microsoft's Screen Scraping?

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.

Jason's article first appeared on his blog MarketingShift.com.


TOPICS: Business/Economy; Culture/Society; News/Current Events
KEYWORDS: google; internetexploiter; microsnot; underweartootight
Navigation: use the links below to view more comments.
first previous 1-20 ... 121-140141-160161-180181-200 next last
To: KwasiOwusu

Yep.

You are using the beta, though. It will be slightly different depending on the different data centers for different queries.

Since the regular MSN is using Yahoo! results, MSN will typically be the same as Yahoo!. (At least until the beta goes full release)


161 posted on 11/11/2004 4:28:45 PM PST by invoman
[ Post Reply | Private Reply | To 158 | View Replies]

To: mhking
Everyone,
I clearly started of on the wrong foot on this one.
Too many heated words.
I have to apologize to anyone I have offended in this thread.
I am off to have my dinner, a bit wiser than i was this morning I hope.
162 posted on 11/11/2004 4:29:32 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 1 | View Replies]

To: invoman
You are right about Yahoo and the main MSN search engine.
163 posted on 11/11/2004 4:31:39 PM PST by KwasiOwusu
[ Post Reply | Private Reply | To 161 | View Replies]

To: KwasiOwusu

No problems. We're all still friends. I think.

You had a fairly defensible position, you just failed to sufficiently defend it, resorting to tongue-lashings instead. A mistake we all make at times.

If you'll allow me to be a bit condescending, learn to watch the tongue and you'll be fine.

Have a great evening.


164 posted on 11/11/2004 4:33:42 PM PST by K1avg
[ Post Reply | Private Reply | To 162 | View Replies]

To: Revel
Microsoft did not write the original Dos. They purchased at some rediculous price from someone else. They got the idea for windows from someone else too. They stole most of the good features in IE from Netscape. So Microsoft is known for not having original ideas.

Maybe so, but the one good thing Microsoft did was establish fairly open software and hardware standards that allowed for explosive growth of the PC market, which eventually led to the internet market. As a comparison look at what IBM did with the PS/2 and Microchannel. Or Apple if you like.

165 posted on 11/11/2004 5:01:17 PM PST by Moonman62 (Federal Creed: If it moves tax it. If it keeps moving regulate it. If it stops moving subsidize it.)
[ Post Reply | Private Reply | To 8 | View Replies]

To: K1avg

53 - I just tried the new MS search engine, and searched on Google, and the first articles I tried to read, it turns out I had to enroll at those web sites before I could even check out the article they had posted.

MS can keep their new search engine.


166 posted on 11/11/2004 5:02:40 PM PST by XBob (Free-traitors steal our jobs for their profit.)
[ Post Reply | Private Reply | To 53 | View Replies]

To: KwasiOwusu
Here I google search and found this:

Microsoft, Google Square Off in Search Arena

167 posted on 11/11/2004 5:03:17 PM PST by TexKat (Just because you did not see it or read it, that does not mean it did or did not happen.)
[ Post Reply | Private Reply | To 147 | View Replies]

To: mhking
207.46.98.72 - - [07/Nov/2004:05:20:31 -0600] "GET /robots.txt HTTP/1.0" 200 311
207.46.98.72 - - [07/Nov/2004:05:20:34 -0600] "GET /joms/servlet/sItem?itemId=2069&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:05:53:56 -0600] "GET /joms/servlet/sItem?itemId=1234&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:03:48 -0600] "GET /joms/servlet/sItemInfo?itemId=1062&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:21:34 -0600] "GET /joms/servlet/sItemInfo?itemId=2070&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:23:30 -0600] "GET /joms/servlet/sItemInfo?itemId=1145&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:27:09 -0600] "GET /joms/servlet/sItemInfo?itemId=1122&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:30:38 -0600] "GET /joms/servlet/sItemInfo?itemId=1050&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:32:47 -0600] "GET /joms/servlet/sItemInfo?itemId=2069&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:34:01 -0600] "GET /joms/servlet/sItemInfo?itemId=1037&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:35:48 -0600] "GET /joms/servlet/sItem?itemId=2054&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:38:28 -0600] "GET /joms/servlet/sItem?itemId=1251&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:41:50 -0600] "GET /joms/servlet/sItem?itemId=1122&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:43:43 -0600] "GET /joms/servlet/sItem?itemId=1062&classId=2B HTTP/1.0" 500 634
Lot's more skipped

The joms servlet was something I worked on a while back and took of the site 3/60days ago. This the kind of info you are looking for?

168 posted on 11/11/2004 5:06:46 PM PST by jpsb (MAN)
[ Post Reply | Private Reply | To 1 | View Replies]

To: KwasiOwusu
Well I like the new Microsoft Search Engine and I'm making it my default. It's clean, fast and easy to use. I like the drop down box next to the query field where you can choose Web, News, Dictionary, Encyclopedia, etc.
169 posted on 11/11/2004 5:10:48 PM PST by SamAdams76 (Red Sox Win The World Series...And Bush Wins Re-election Too!)
[ Post Reply | Private Reply | To 76 | View Replies]

To: KwasiOwusu; mhking
Microsoft Search Encounters Glitches on First Day
170 posted on 11/11/2004 5:19:12 PM PST by TexKat (Just because you did not see it or read it, that does not mean it did or did not happen.)
[ Post Reply | Private Reply | To 147 | View Replies]

To: Knitebane
William H. Gates III is a left-wing socialist

Melinda French has had a bad (socialist) effect on him, but he and his family have always leaned left of center. To his credit he doesn't ram politics down your throat.

By the way, we're lucky as a society that a business-minded guy like Gates won out instead of Patterson or Kildall in the early days of DOS. I don't think a lot of what we take for granted would have been developed as quickly if there hadn't been a common standard. I was there and remember what the hobbyist computer arena was like back then...very fragmented and ego-driven, sometimes self-destructively so.
171 posted on 11/11/2004 5:26:08 PM PST by JayNorth
[ Post Reply | Private Reply | To 42 | View Replies]

To: papasmurf

This is the same microsoft that forgot to renew their ownership of hotmail.com


172 posted on 11/11/2004 5:30:23 PM PST by N3WBI3
[ Post Reply | Private Reply | To 15 | View Replies]

To: Knitebane

That's my recollection also.


173 posted on 11/11/2004 5:40:15 PM PST by JLO
[ Post Reply | Private Reply | To 149 | View Replies]

To: Moonman62
Maybe so, but the one good thing Microsoft did was establish fairly open software and hardware standards that allowed for explosive growth of the PC market, which eventually led to the internet market.

You are correct as to the cause and effect, you just have the wrong company in mind.

IBM established the fairly open software and hardware specifications. They later changed their minds with MCA, but the original open specification was an IBM thing. Microsoft had nothing to do with it.

And when IBM dropped the ball by trying the MCA mess, Phoenix, Compaq and Intel were right there to pick it up and run with it. And again, Microsoft had nothing to do with it.

About Apple, I agree with you. That's why I won't buy an Apple. I dumped Microsoft to get away from a proprietary, closed software platform. Why jump onto a closed, proprietary hardware platform instead?

174 posted on 11/11/2004 5:46:44 PM PST by Knitebane
[ Post Reply | Private Reply | To 165 | View Replies]

To: mhking

Screw Google...nest of liberal scum...love it


175 posted on 11/11/2004 5:50:11 PM PST by antaresequity
[ Post Reply | Private Reply | To 1 | View Replies]

To: JayNorth
I was there too, and I remember what Bill Gates did to the community that gave him his start.

Bill Gates built an empire off of the backs of the members of the Homebrew Computer Club.

And Microsoft has never been about standards. IBM was about standards. Intel was about standards. Microsoft was about proprietary, closed protocols and formats that they spammed out until enough people used them that the rest had to comply just to be able to communicate with them.

176 posted on 11/11/2004 5:50:48 PM PST by Knitebane
[ Post Reply | Private Reply | To 171 | View Replies]

To: antaresequity

As opposed to Microsoft...nest of liberal scum?


177 posted on 11/11/2004 5:51:51 PM PST by Knitebane
[ Post Reply | Private Reply | To 175 | View Replies]

To: KwasiOwusu

"Yep. A very proud owner of Microsoft stock."

===

So can we assume as being very proud, you are either a current or past employee?

And, if so, (LOL) why the heck is IE so screwed up in its update downloads? Really messes up in XP Pro.


178 posted on 11/11/2004 5:56:05 PM PST by JLO
[ Post Reply | Private Reply | To 145 | View Replies]

To: KwasiOwusu

Oh, my. You just got here. And you're trashing Google, in favor of MicroHurl. Do read the forum for a while before you post, OK?

Microsoft is not the favorite son of FR.


179 posted on 11/11/2004 6:11:41 PM PST by MineralMan (godless atheist)
[ Post Reply | Private Reply | To 19 | View Replies]

To: KwasiOwusu; mhking; All

I have to apologize to anyone I have offended in this thread.
I am off to have my dinner, a bit wiser than i was this morning I hope.

===

Tomorrow is another day, eh? My advice, which you obviously already learned, is read around a bit before attacking.


180 posted on 11/11/2004 6:12:13 PM PST by JLO
[ Post Reply | Private Reply | To 162 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-20 ... 121-140141-160161-180181-200 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson