Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Google Spanner — instamatic redundancy for 10 million servers?
The Register (UK) ^ | 23rd October 2009 21:40 GMT | Cade Metz in San Francisco

Posted on 11/02/2009 8:19:30 AM PST by Ernest_at_the_Beach

Mountain View wants your exabyte

Google’s massively global infrastructure now employs a proprietary system that automatically moves and replicates loads between its mega data centers when traffic and hardware issues arise.

The distributed technology was first hinted at — in classically coy Google fashion — during a conference this summer, and Google fellow Jeff Dean has now confirmed its existence in a presentation (PDF) delivered at a symposium earlier this month.

The platform is known as Spanner. Dean’s presentation calls it a “storage and computation system that spans all our data centers [and that] automatically moves and adds replicas of data and computation based on constraints and usage patterns.” This includes constraints related to bandwidth, packet loss, power, resources, and “failure modes”.

Dean speaks of an “automated allocation of resources across [Google’s] entire fleet of machines” — and that's quite a fleet. Google now has at least 36 data centers across the globe — though a handful may still be under construction. And as Data Center Knowledge recently noticed, the goal is to span a far larger fleet.

According to Dean’s presentation, Google is intent on scaling Spanner to between one million and 10 million servers, encompassing 10 trillion (1013) directories and a quintillion (1018) bytes of storage. And all this would be spread across “100s to 1000s” of locations around the world.

Imagine that. A single corporation housing an exabyte of the world's data across thousands of custom-built data centers.

Google Spanner

Google’s 10-million-server vision

Dean declined to discuss the presentation with The Reg. And Google’s PR arm has yet to respond to specific questions about the Spanner setup. But Google senior manager of engineering and architecture Vijay Gill alluded to the technology during an appearance at the cloud-happy Structure 09 mini-conference in San Francisco earlier this year.

Next page: Google’s favorite sentence


TOPICS: Business/Economy; Computers/Internet
KEYWORDS: google; hitech
This is an Excerpt

*********************

1 posted on 11/02/2009 8:19:32 AM PST by Ernest_at_the_Beach
[ Post Reply | Private Reply | View Replies]

To: Ernest_at_the_Beach

Google = Obama’s NASA.

I do not trust them at all.


2 posted on 11/02/2009 8:21:04 AM PST by Frantzie (Judge David Carter - democrat & dishonorable Marine like John Murtha.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce
Lots of hardware being used by Google.....

Maybe Tilera & Quanta can benefit...

Related thread:

Quanta opens servers to 100-core Tilera

3 posted on 11/02/2009 8:21:57 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Frantzie
Google = Obama’s NASA.

Where is your support for that statement.?

4 posted on 11/02/2009 8:24:52 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 2 | View Replies]

To: rdb3; Calvinist_Dark_Lord; GodGunsandGuts; CyberCowboy777; Salo; Bobsat; JosephW; ...

5 posted on 11/02/2009 8:30:28 AM PST by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 1 | View Replies]

To: All
The Next Page:

Google’s favorite sentence

6 posted on 11/02/2009 8:31:41 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 1 | View Replies]

To: All
Another article from the Register on Google:

Google Caffeine: What it really is

******************************EXCERPT******************************

By Cade Metz in San Francisco

Posted in Servers, 14th August 2009 22:06 GMT

As it invites the world to play in a mysterious sandbox it likes to call "Caffeine," Google is testing more than just a "next-generation" search infrastructure. It's testing at least a portion of a revamped software architecture that will likely underpin all of its online applications for years to come.

Speaking with The Reg, über-Googler Matt Cutts confirms that the company's new Caffeine search infrastructure is built atop a complete overhaul of the company's custom-built Google File System, a project two years in the making. At least informally, Google refers to this file system redux as GFS2.

7 posted on 11/02/2009 8:40:36 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 6 | View Replies]

To: All
Another articlee:

Google File System II: Dawn of the Multiplying Master Nodes

***************************************EXCERPT*************************************

A sequel two years in the making

By Cade Metz in San Francisco •

Posted in Servers, 12th August 2009 02:12 GMT

Updated As its custom-built file system strains under the weight of an online empire it was never designed to support, Google is brewing a replacement.

Apparently, this overhaul of the Google File System is already under test as part of the "Caffeine" infrastructure the company announced earlier this week.

In an interview with the Association for Computer Machinery (ACM), Google's Sean Quinlan says that nearly a decade after its arrival, the original Google File System (GFS) has done things he never thought it would do.

"Its staying power has been nothing short of remarkable given that Google's operations have scaled orders of magnitude beyond anything the system had been designed to handle, while the application mix Google currently supports is not one that anyone could have possibly imagined back in the late 90s," says Quinlan, who served as the GFS tech leader for two years and remains at Google as a principal engineer.

But GFS supports some applications better than others. Designed for batch-oriented applications such as web crawling and indexing, it's all wrong for applications like Gmail or YouTube, meant to serve data to the world's population in near real-time.

"High sustained bandwidth is more important than low latency," read the original GPS research paper. "Most of our target applications place a premium on processing data in bulk at a high rate, while few have stringent response-time requirements for an individual read and write." But this has changed over the past ten years - to say the least - and though Google has worked to build its public-facing apps so that they minimize the shortcomings of GFS, Quinlan and company are now building a new file system from scratch.

With GFS, a master node oversees data spread across a series of distributed chunkservers. Chunkservers, you see, store chunks of data. They're about 64 megabytes apiece.

8 posted on 11/02/2009 8:45:41 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 7 | View Replies]

To: Ernest_at_the_Beach
Where is your support for that statement.?

Ern, it's the Intarweb.

Nobody has to support anything.1

_______________
1 Martin Fierro, 2009
9 posted on 11/02/2009 8:57:41 AM PST by martin_fierro (< |:)~)
[ Post Reply | Private Reply | To 4 | View Replies]

To: All
And another :

Hadoop - Why is Google juicing Yahoo! search?

********************************EXCERPT***********************************

Inside the Mountain View mind

By Cade Metz in San Francisco

Posted in Software, 9th April 2009 01:01 GMT

It's the Google equivalent of the everlasting gobstopper. And for some reason, the Mountain View Chocolate Factory has encouraged a knockoff industry among its Slugworthian rivals.

Considering the code of secrecy that typically envelops Google's internal operations, you have to wonder why the company helped foster the birth and ongoing development of Hadoop, the open-source incarnation of the new-age grid-computing platform that underpins its vast online infrastructure. Hadoop now drives at least a portion of Yahoo!'s search engine, and it runs Powerset, the basis for Microsoft's next-generation search extravaganza.

According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the much-discussed Hadoop startup Cloudera - any advantages Hadoop bestows on Google's chief rivals is outweighed by the long-term benefits shoveled back into the Chocolate Factory. Famously, Hadoop is an educational tool for the next-generation of Google Oompa Loompas, and in theory its widespread adoption will eventually shove more stuff through Google's own search engine - meaning Google can serve ads and make more money.

But, it seems, the old Google arrogance is also at play. In sharing its distributed-computing genius with the rest of the world, Bisciglia says, Google "showed the world that they were right."

In 2004, Google published a pair of research papers describing its distributed file system, known as GFS, and its software framework for distributed data-crunching, known as MapReduce. And in short order, an independent developer named Doug Cutting launched an open-source project based on the two papers. He called it Hadoop after his son's yellow stuffed elephant.

By early 2006, Yahoo! was toying with the project, and the Google rival soon put Cutting on the payroll, slowly rolling Hadoop into its back-end infrastructure. The open-source platform powers the new Yahoo! Search Webmap, a mega-app that builds a database of all known web pages – complete with all the metadata needed to, shall we say, understand them. According to Yahoo! Grid Computing Pooh-Bah Eric Baldeschwieler, the fledgling app draws its map 33 per cent faster than the company's previous system - on the same hardware.

Facebook has embraced Hadoop in similar fashion. Amazon is offering the platform as a web service over its AWS virtual data center. And even Microsoft is feeding off the project's open-sourciness, thanks to its recent purchase of Powerset.

But in a very different way, Hadoop has also become a valuable tool for Google itself.

*********************************PAGE 2**************************************

Big Data 101

When Christophe Bisciglia was still at Google, interviewing student engineers for admission to the Chocolate Factory, he was struck by how difficult it was for the uninitiated to grasp the company's multi-terrabyte data transformations.

"I started notice repeating pattern when interviewing students," he tells The Reg. "I would say 'OK, that's a great solution to the problem, but what would you do if you had a 1000 times as much data?' And they would just stare out at me, blank. It wasn't that they weren't smart or talented. It's just they'd never had the exposure."

In the hopes of shrinking this education gap, Google sent Bisciglia back to his alma mater, the University of Washington, where he taught a course on "working with big data." And Hadoop was the teaching model.

Google ended up hiring about half the students who took the class. And after the company open-sourced the curriculum, the same course was picked up by several other universities, including MIT and Berkeley. "In the past, it took three to six months to get hires up to speed with how to work with [Google] technology," Bisciglia says. "But if schools are teaching this as part of the standard undergraduate curriculum, Google saved that three to six months - multiplied by thousands of engineers."

To further facilitate such education, the company setup a Hadoop cluster inside one of its (then top secret) data centers, offering access to researchers across the planet.

Yes, this also juices the Yahoo!s and the Microsofts of the world. But Google is fond of saying "what's good for the internet is good for Google."

"As a result of having this large-scale data-processing technology easily available in open-source form, it makes it easier for other business to create and publish more data," ex-Googler Christophe Bisciglia says. "The more data that other business create and publish, the more data Google can slurp up and make universally accessible and useful."

Why didn't Google just open-source MapReduce and GFS on its own? Bisciglia says the company mulled the idea "a little bit," but decided it was less than practical. "MapReduce and GFS is infinitely integrated with so many other systems. Trying to cleanly excise them would be a software engineering challenge that would take millions of man hours. There would be no clean way to cut it out."

Plus, by the time Google got around to its mulling, Hadoop was already a thriving open source project. "It had a good community around it. It was seeing adoption at Yahoo! and Facebook," he says. "It wouldn't have been good for the community to have these two competing projects that do the same thing."

And, Bisciglia acknowledges, Google likes the fact that it's internal platform is "just a little bit better."

Last year, Hadoop researchers set an record on Jim Gray's sort algorithm, sorting a terabyte of random data in three minutes across 900 machines. But shortly thereafter, Google couldn't help but pipe up with the claim that it's very own MapReduce had done the job in just 60 seconds.

When it comes time to praise itself, Google isn't above lifting the code the secrecy.

10 posted on 11/02/2009 9:07:58 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 8 | View Replies]

To: martin_fierro

LOL!!


11 posted on 11/02/2009 9:08:42 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 9 | View Replies]

To: martin_fierro
Man....what is all of this stuff....:

Hadoop From Wikipedia, the free encyclopedia

*********************************EXCERPT********************************

Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Hadoop is a top-level Apache project, being built and used by a community of contributors from all over the world.[2] Yahoo! has been the largest contributor[3] to the project and uses Hadoop extensively in its web search and advertising businesses.[4] IBM and Google have announced a major initiative to use Hadoop to support university courses in distributed computer programming.[5]

Hadoop was created by Doug Cutting (now a Cloudera employee)[6], who named it after his child's stuffed elephant. It was originally developed to support distribution for the Nutch search engine project.[7]

12 posted on 11/02/2009 9:13:52 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 9 | View Replies]

To: All
And who is Cloudera?

Bottling the Magic Behind Google and Facebook

*********************EXCERPT************************

March 16, 2009, 1:38 am

By Ashlee Vance

Christophe Bisciglia, Amr Awadallah, Jeff Hammerbacher and Mike Olson started their company, Cloudera, around Hadoop.

Cloudera is the quintessential Silicon Valley story.

Three of the top engineers from Google, Yahoo and Facebook have teamed up with an ex-Oracle executive to tackle the problems inherent in quickly analyzing big piles of data. On Monday, they’re revealing a commercial product based on the open source software Hadoop, which provides the analytical magic behind the world’s biggest Web sites. The team at Cloudera, based in Burlingame, Calif., think they can extend Web smarts to the business world, aiding companies in retail, insurance, bio-tech and oil and gas.

Hadoop is the open-source version of the file system and MapReduce technology developed by Google. Google has used such software to rewire its entire search index, making it possible for the company to run ever-faster searches on cheap servers and to ask questions of its vast data stores and receive coherent answers.

Rather than keeping data locked in a central database, Google spreads information across thousands of servers. Engineers can then send out requests to these servers via MapReduce and gain new insights into peoples’ searching behavior and the relationships between Web sites. Best of all, MapReduce keeps these complicated jobs humming along even when computers fail because of its ability to maintain a cohesive picture of all the systems.

While Google has kept the deep details on this technology to itself, the company did publish a couple of papers describing some of the underlying principles. That gave Doug Cutting, formerly a software consultant and now a Yahoo engineer, enough information to create an open-source take on the code.

Yahoo has since invested millions of dollars improving Hadoop and uses the technology to figure out what users should see on its home page, based on their surfing habits, and what ads to display next to search results.

Other Web 2.0 users, including Microsoft, Facebook and Fox Interactive Media, have picked up Hadoop as well.

13 posted on 11/02/2009 9:18:11 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 12 | View Replies]

To: All
Continuing to follow this ....

Embedded link from just above:

Cloudera's Distribution for Hadoop

********************************EXCERPT*************************************

RPM,Debian,AWS & Automatic Configuration

Cloudera's Distribution for Hadoop is based on the most recent stable version of Apache Hadoop. It includes some useful patches back-ported from future releases, as well as improvements we have developed for our support customers.

Cloudera's Distribution includes everything you need to configure and deploy Hadoop using standard linux system administration tools. This first release provides:

In addition, Cloudera's Configurator for Hadoop can generate optimized configuration files for your cluster based on a few simple questions. See the screencast on this page to learn how the Configurator works.

Cloudera's Distribution for Hadoop is released under the Apache 2.0 license, and is distributed for free through our public YUM and APT repositories. Our Distribution is well-tested on Red Hat variants including CentOS 5, RHEL5, and FC8, and Debian platforms such as Ubuntu.

Get Started Using Cloudera's Distribution for Hadoop

If you're running one of the supported Linux distributions and want to configure your own Hadoop cluster, we've posted instructions that will help you get moving with RPMs or Ubuntu / Debian Packages. If you'd like our help configuring your cluster, try out the Configurator.

Lastly, we've also posted instructions for running Cloudera's Distribution for Hadoop on Amazon's EC2.

14 posted on 11/02/2009 9:24:57 AM PST by Ernest_at_the_Beach ( Geert Wilders)
[ Post Reply | Private Reply | To 13 | View Replies]

To: Ernest_at_the_Beach

Google = Obama’s NASA.

Where is your support for that statement.?

“Net neutrality” bill — if you cannot see the connection then you need to do more readings......

http://googlepublicpolicy.blogspot.com/search/label/Net%20Neutrality


15 posted on 11/02/2009 9:36:38 AM PST by color_tear
[ Post Reply | Private Reply | To 4 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson