Skip to comments.
Microsoft Crawling Google Results For New Search Engine?
WebProNews ^
| 11.11.04
Posted on 11/11/2004 1:35:03 PM PST by mhking
Microsoft Crawling Google Results For New Search Engine?
Jason Dowdell | Contributing Writer
2004-11-11
I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.
|
Is MSN Crawling Google? |
|
Is Microsoft "using" Google's search results to populate their index? Discuss Microsoft's behavior at WebProWorld.
The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.
Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)
Here's the kicker
So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.
My Theory On This Mysterious Microsoft Crawler
The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.
First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.
Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.
So Microsoft's Screen Scraping?
Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.
Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.
Jason's article first appeared on his blog MarketingShift.com.
TOPICS: Business/Economy; Culture/Society; News/Current Events
KEYWORDS: google; internetexploiter; microsnot; underweartootight
Navigation: use the links below to view more comments.
first previous 1-20 ... 121-140, 141-160, 161-180, 181-200 next last
To: KwasiOwusu
Yep.
You are using the beta, though. It will be slightly different depending on the different data centers for different queries.
Since the regular MSN is using Yahoo! results, MSN will typically be the same as Yahoo!. (At least until the beta goes full release)
161
posted on
11/11/2004 4:28:45 PM PST
by
invoman
To: mhking
Everyone,
I clearly started of on the wrong foot on this one.
Too many heated words.
I have to apologize to anyone I have offended in this thread.
I am off to have my dinner, a bit wiser than i was this morning I hope.
To: invoman
You are right about Yahoo and the main MSN search engine.
To: KwasiOwusu
No problems. We're all still friends. I think.
You had a fairly defensible position, you just failed to sufficiently defend it, resorting to tongue-lashings instead. A mistake we all make at times.
If you'll allow me to be a bit condescending, learn to watch the tongue and you'll be fine.
Have a great evening.
164
posted on
11/11/2004 4:33:42 PM PST
by
K1avg
To: Revel
Microsoft did not write the original Dos. They purchased at some rediculous price from someone else. They got the idea for windows from someone else too. They stole most of the good features in IE from Netscape. So Microsoft is known for not having original ideas. Maybe so, but the one good thing Microsoft did was establish fairly open software and hardware standards that allowed for explosive growth of the PC market, which eventually led to the internet market. As a comparison look at what IBM did with the PS/2 and Microchannel. Or Apple if you like.
165
posted on
11/11/2004 5:01:17 PM PST
by
Moonman62
(Federal Creed: If it moves tax it. If it keeps moving regulate it. If it stops moving subsidize it.)
To: K1avg
53 - I just tried the new MS search engine, and searched on Google, and the first articles I tried to read, it turns out I had to enroll at those web sites before I could even check out the article they had posted.
MS can keep their new search engine.
166
posted on
11/11/2004 5:02:40 PM PST
by
XBob
(Free-traitors steal our jobs for their profit.)
To: KwasiOwusu
167
posted on
11/11/2004 5:03:17 PM PST
by
TexKat
(Just because you did not see it or read it, that does not mean it did or did not happen.)
To: mhking
207.46.98.72 - - [07/Nov/2004:05:20:31 -0600] "GET /robots.txt HTTP/1.0" 200 311
207.46.98.72 - - [07/Nov/2004:05:20:34 -0600] "GET /joms/servlet/sItem?itemId=2069&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:05:53:56 -0600] "GET /joms/servlet/sItem?itemId=1234&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:03:48 -0600] "GET /joms/servlet/sItemInfo?itemId=1062&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:21:34 -0600] "GET /joms/servlet/sItemInfo?itemId=2070&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:23:30 -0600] "GET /joms/servlet/sItemInfo?itemId=1145&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:27:09 -0600] "GET /joms/servlet/sItemInfo?itemId=1122&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:30:38 -0600] "GET /joms/servlet/sItemInfo?itemId=1050&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:32:47 -0600] "GET /joms/servlet/sItemInfo?itemId=2069&classId=2B HTTP/1.0" 404 734
207.46.98.72 - - [07/Nov/2004:06:34:01 -0600] "GET /joms/servlet/sItemInfo?itemId=1037&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:35:48 -0600] "GET /joms/servlet/sItem?itemId=2054&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:38:28 -0600] "GET /joms/servlet/sItem?itemId=1251&classId=2B HTTP/1.0" 500 634
207.46.98.72 - - [07/Nov/2004:06:41:50 -0600] "GET /joms/servlet/sItem?itemId=1122&classId=2B HTTP/1.0" 404 722
207.46.98.72 - - [07/Nov/2004:06:43:43 -0600] "GET /joms/servlet/sItem?itemId=1062&classId=2B HTTP/1.0" 500 634
Lot's more skipped
The joms servlet was something I worked on a while back and took of the site 3/60days ago. This the kind of info you are looking for?
168
posted on
11/11/2004 5:06:46 PM PST
by
jpsb
(MAN)
To: KwasiOwusu
Well I like the new
Microsoft Search Engine and I'm making it my default. It's clean, fast and easy to use. I like the drop down box next to the query field where you can choose Web, News, Dictionary, Encyclopedia, etc.
169
posted on
11/11/2004 5:10:48 PM PST
by
SamAdams76
(Red Sox Win The World Series...And Bush Wins Re-election Too!)
To: KwasiOwusu; mhking
170
posted on
11/11/2004 5:19:12 PM PST
by
TexKat
(Just because you did not see it or read it, that does not mean it did or did not happen.)
To: Knitebane
William H. Gates III is a left-wing socialist
Melinda French has had a bad (socialist) effect on him, but he and his family have always leaned left of center. To his credit he doesn't ram politics down your throat.
By the way, we're lucky as a society that a business-minded guy like Gates won out instead of Patterson or Kildall in the early days of DOS. I don't think a lot of what we take for granted would have been developed as quickly if there hadn't been a common standard. I was there and remember what the hobbyist computer arena was like back then...very fragmented and ego-driven, sometimes self-destructively so.
To: papasmurf
This is the same microsoft that forgot to renew their ownership of hotmail.com
172
posted on
11/11/2004 5:30:23 PM PST
by
N3WBI3
To: Knitebane
That's my recollection also.
173
posted on
11/11/2004 5:40:15 PM PST
by
JLO
To: Moonman62
Maybe so, but the one good thing Microsoft did was establish fairly open software and hardware standards that allowed for explosive growth of the PC market, which eventually led to the internet market. You are correct as to the cause and effect, you just have the wrong company in mind.
IBM established the fairly open software and hardware specifications. They later changed their minds with MCA, but the original open specification was an IBM thing. Microsoft had nothing to do with it.
And when IBM dropped the ball by trying the MCA mess, Phoenix, Compaq and Intel were right there to pick it up and run with it. And again, Microsoft had nothing to do with it.
About Apple, I agree with you. That's why I won't buy an Apple. I dumped Microsoft to get away from a proprietary, closed software platform. Why jump onto a closed, proprietary hardware platform instead?
To: mhking
Screw Google...nest of liberal scum...love it
To: JayNorth
I was there too, and I remember what Bill Gates did to the community that gave him his start.
Bill Gates built an empire off of the backs of the members of the Homebrew Computer Club.
And Microsoft has never been about standards. IBM was about standards. Intel was about standards. Microsoft was about proprietary, closed protocols and formats that they spammed out until enough people used them that the rest had to comply just to be able to communicate with them.
To: antaresequity
As opposed to Microsoft...nest of liberal scum?
To: KwasiOwusu
"Yep. A very proud owner of Microsoft stock."
===
So can we assume as being very proud, you are either a current or past employee?
And, if so, (LOL) why the heck is IE so screwed up in its update downloads? Really messes up in XP Pro.
178
posted on
11/11/2004 5:56:05 PM PST
by
JLO
To: KwasiOwusu
Oh, my. You just got here. And you're trashing Google, in favor of MicroHurl. Do read the forum for a while before you post, OK?
Microsoft is not the favorite son of FR.
179
posted on
11/11/2004 6:11:41 PM PST
by
MineralMan
(godless atheist)
To: KwasiOwusu; mhking; All
I have to apologize to anyone I have offended in this thread.
I am off to have my dinner, a bit wiser than i was this morning I hope.
===
Tomorrow is another day, eh? My advice, which you obviously already learned, is read around a bit before attacking.
180
posted on
11/11/2004 6:12:13 PM PST
by
JLO
Navigation: use the links below to view more comments.
first previous 1-20 ... 121-140, 141-160, 161-180, 181-200 next last
Disclaimer:
Opinions posted on Free Republic are those of the individual
posters and do not necessarily represent the opinion of Free Republic or its
management. All materials posted herein are protected by copyright law and the
exemption for fair use of copyrighted works.
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson