Noah Johnson:
Folding Proteins at Home
By Barbara Gibson
In what is rapidly becoming a new world of democratic computing, Noah Johnsons three Power Macs are humming quietly away, helping Stanford University scientists solve a complex problem that, one day, may help them fight disease.
As part of a groundbreaking distributed computing experiment, Johnson and half a million other people are donating their spare computer capacity so Stanford can remotely simulate protein folding, an essential biochemical process that controls vital body functions.
The project, called Folding@home, represents a sort of ad hoc democracy because anyone with an Internet connection can join. Users simply download an application that runs protein-folding simulations on their desktop computers when the systems are idle.
Origami Gone Wrong
If you think of a cell as a house, proteins are everything that goes in it the framing, the furniture, the fixtures. Proteins are the working parts of living matter.
The human body makes at least 50,000 different proteins, and each one assumes a particular shape, known as a fold, to carry out a particular function. Hemoglobin folds into a shape that lets it carry oxygen. Insulin fits like a key into spaces so it can turn things on and off. Other proteins fold into shapes that build bones, muscles, hair, skin or blood vessels.
When proteins dont fold properly think of origami gone wrong they can poison the cells around them and trigger diseases such as Alzheimers, cystic fibrosis, an inherited form of emphysema and even many cancers.
Folding@home pulls simulation packets from Stanford servers, makes folding calculations on the Mac and sends results to the Stanford server.
Sharing the Workload
Scientists have already sequenced the human genome, which is basically a blueprint for all of the proteins in biology, says Johnson, a computer programmer who folds at home as a hobby.
But analyzing a proteins possible folding steps as it crumples up into a 3-D knot is daunting task, even for a supercomputer, because the molecular backbone of a protein can fold in trillions of different ways. While several supercomputers used together could handle the job, time slots on supercomputers are tight and very expensive.
Through Folding@home, scientists now have the horsepower to study the mechanics of protein folding. With its ability to share the workload among hundred of thousands of computers economically, Folding@home can help scientists understand how proteins snap, or dont, into their predestined shapes and may help to explain the origins of diseases such as Alzheimers and apparently unrelated diseases.
New Algorithm
Dr. Vijay Pande, a professor of chemistry and structural biology at Stanford, saw the potential for thousands of desktop computers to calculate tiny portions of a folding sequence. He wrote algorithms for Mac, Windows and Linux computers, and worked with distributed-computing entrepreneur Adam Beberg to integrate his code into an application dubbed Folding@home.
When the application is running on a Mac or other computer, Pandes software pulls simulation packets from Stanford servers, makes folding calculations on the Mac and reports the results back to the Stanford server.
Johnson says he decided to get into folding because it helps research into diseases. You dont have to be a scientist to help. You dont have to understand complex biological molecules to make a difference.
Processors of the World, Unite!
Folding@home isnt the worlds first project that uses the spare capacity of thousands of computers in loosely-linked networks. The same distributed computing concept fueled the discovery of the largest prime numbers and deciphered an RC5-65 encryption algorithm.
The most famous distributed computing project, the Search for Extra-Terrestrial Intelligence (SETI@Home), uses millions of desktop computers to analyze radio telescope data in an the ongoing search for extraterrestrial life.
Since Folding@home debuted in 2000, more than 500,000 computers across the globe have helped simulate the complete folding behavior, atom by atom, of five important proteins.
But How Valid?
To test the validity of the simulation, Pande and his team asked Folding@home volunteers to calculate the rate of folding in a well-understood protein known as BBA-5.
Then they compared the computer findings with physical tests on the same protein. Folding@home computers modeled the protein snapping into shape in 6 thousandths of a second the same amount of time protein takes to form and fold in the lab.
Every day, Folding@home parcels out computational tasks among 43,000 desktop computers, demonstrating that distributed computing can be applied not only to mathematics, but also to problems scientists confront in their laboratories.
Friendly Competition
Protein folding isnt exactly sexy, so Pande nurtured volunteers interest by setting up a competition for those donating the most computer time.
It worked.
Since the project began, volunteers have organized themselves into thousands of teams with names such as Dutch Power Cows, Overclockers Australia and Alliance Francophone. Some teams have just two people; others, thousands of members, all linked by the Internet.
Johnson is the point man for the Mac OS X team, which is comprised of about 1,000 volunteers, all running Mac OS X.
People are starting to bring G5s online. People with G5s are flying by everyone. We just passed a German team and well be passing other teams as well.
Extreme Overclockers
Johnson hosts a web page on his Mac so new members can download the software and current members can check their progress, which is measured in points based on how many chunks of data a computer processes.
I provide a place for the team to get together to discuss problems theyre having, ways to improve folding and so on, Johnson says. When the new Folding@home application came out for the Power Mac G5, we provided information on the team forum to help people use it.
The site also posts individual, team and overall project statistics so Mac OS X team members can see how theyre doing against other teams. In just the last two months, he says, we moved from 30th to 24th place out of 2,000 teams worldwide in terms of our work unit production.
Bringing G5s Online
And people are starting to bring G5s online, Johnson adds. Before, on earlier Macs, theyd stay in one place in the stats; nobody would pass them and they wouldnt pass anybody. But people with G5s are flying by everyone. We just passed a German team and well be passing other teams as well.
Because of the way the software is written, Folding@home doesnt interfere with Johnsons normal use of his Mac. When youre not using the computer, he explains, the project will use everything youve got all of the processor 100% of the time.
But if you want to use the computer, he points out, the project will scale back and you wont see a hit in performance. When youre done, it goes back to 100%.
Every day, Folding@home parcels out computational tasks among 43,000 desktop computers, demonstrating that distributed computing can be applied to problems scientists confront in their laboratories.
Folding on the Mac
The reason I fold on a Mac, Johnson confides, is that I work 40 hours a week at a Windows shop. I got tired of fixing computers all day long and then coming home and fixing computers. So I converted to Mac for sanitys sake.
The Mac has been just head and shoulders over Windows as far as ease of use and not having to fix it all the time, Johnson says. And the operating system is fun to use; with Windows, you have to be tinkering all the time and fixing registry entries. I didnt want to come home and do my job at home.
Also, Johnson says, performance is a factor. Beberg originally ported the Linux version of Folding@home to the Mac, he says, but he also optimized the software for the Power Mac G4, G5 and even the G3. We saw 200 and 300% speed increases with the new software, says Johnson.
Power Mac G5 Performance
Lets say you have a 1GHz processor for your Pentium 4 and a 1GHz G5. Obviously the G5 runs faster than that, but they are baselined at 1GHz apiece. Because the software is optimized for the G5, it will get more work done than a Pentium. It actually manages to squeeze more out of the processor on a megahertz-to-megahertz basis.
Now, Johnson says, we can compete with AMD and Intel processors head to head. Since we launched the new core, weve seen an increase in points very clearly. Which means were handling larger work units more quickly.
Power by the People
Even if he were given exclusive access to all of the worlds supercomputers, Pande still wouldnt have as much processing power as he gets from the supercluster of peoples desktop systems Folding@home relies on. Modern supercomputers are essentially a cluster of hundreds of processors linked by fast networking. But Pande needed the power of hundreds of thousands of processors, not just hundreds.
Since processors in Power Mac G5s and other desktop systems are now comparable to the processors in supercomputers, Pande has access to much greater processing power when he can tap the power of hundreds of millions of desktop systems that are sitting idle during some part of each day.
Folding@home needs the power of hundreds of thousands of processors rather than network speed, so it actually runs more effectively on a supercluster of desktop systems than it would on a supercomputer.
Even so, it took the supercluster 50,000 runs 2,000 years of computer time to crack the virtual folding of a single protein.
Power Mac G5 Supercomputer
Virginia Tech made supercomputing history when it became the first to combine 64-bit Power Mac G5s into the worlds third fastest supercomputer. Based entirely on 1,100 dual-processor Power Mac G5s and off-the-shelf technologies from Apple partners, Virginia Techs new supercomputer named System X not only can go toe-to-toe against the fastest custom-designed supercomputers, it can beat most of them.
It reaches in excess of 10 teraflops of actual performance, providing massive scientific computing power to Virginia tech scientists at the cheapest price/performance of any supercomputer on the Top 500 list.
The G5 was a perfect fit for the architectural goals of our system, says Virginia Techs Dr. Srinidhi Varadaraan. It has a 64-bit processor with two double-position floating point units, excellent memory bandwidth and an I/O architecture that allows us to interconnect it into a supercomputer.
There is really nothing much different you can do on most of the custom-designed supercomputers that you cannot do on this system, and you cannot do better on this system.
See how the PowerPC G5 processor powers the Virginia Tech world-class supercomputer.
Xgrid: Supercomputing Made Easy
Turning a Mac cluster into a supercomputer has just been simplified with Xgrid, a computational clustering technology from Apples Advanced Computation Group.
Xgrid helps scientists and others working in compute-intensive environments to fully utilize all IT resources, including desktops and servers. Just as Folding@home takes advantage of unused computing capacity, Xgrid automatically discovers, connects and manages tasks across available systems in a Mac cluster.