Replies

GPUs to increase Folding@Home processing power up to 500x

Wolfgang Gruener and Mark Raby

September 29, 2006 13:03

Palo Alto (CA) - Stanford University's Folding@Home project today announced a new software that will enable the use of graphics cards within the distributed computing project. Project leaders will tap into the floating point horsepower of graphics chips and hope to see a massive jump in processing power that can lead to more research results in less time.

Whenever there is a need of enormous processing power scientist typically make use of, if available, supercomputers. And if supercomputers aren't enough, clusters have been expanded in the past, for example for the SETI@Home ("search for extraterrestrial life") program of the University of California, Berkeley, or the Folding@Home project of Stanford University, to distributed computing projects.

Folding@Home, launched in 2000 by Stanford Associate Professor Vijay Pande, researches the "folding" process of proteins - a term that refers to the assembling and reassembling of proteins. This biological process not always happens in a perfect way and if "proteins get screwed up," said Pande, biomedical problems and diseases can develop - such as Alzheimer's disease, one of Pande's research interests.

Computer simulation of these folding processes can help researchers to learn more about the processes and find cures for or even prevent certain diseases. The problem is that these simulations consume huge amounts of processing power. "This scenario has become a 25-year nightmare," Pande said. Researchers are far away from being able to calculate folding processes in real time: In fact, 1 ns of folding currently takes about 1 day to calculate - or 1 second will take more than 2700 years on "one fast processor," he said.

"Even if the government would give us everything they have in computing power, it wouldn't be enough," he explained. Some simulations could still take up to 40 years to yield some results. Folding@Home was an idea to reach many more machines that would run calculations and, according to Pande, it has been a great success - with more than 200,000 active computers worldwide crunching numbers at this time.

Dual-core processors promised to bring another increase in horsepower, but Pande said that "twice" the performance doesn't cut it: "We need 30 or 40 times the speed to turn months into days," he said.

The project members looked into options to increase the processing speed and ended up at solutions such as Clearspeed's accelerator card, which provides about 100 GFlops or about four times the performance of a current Core 2 Duo chip. But the cards are priced at $5000 even in volume and aren't quite what one would call affordable. Pande now believes to have found a solution by tapping into the capabilities of modern graphics cards, which are monsters in terms of floating point performance: A new client released today supports ATI's X1900 and X1950 graphics cards, which can unleash about 375 GFlops, which is about 20 to 40 times more speed than what the project has seen so far. The group has also improved the software algorithm of Folding@Home, which he expects will bring another 10 - 15x improvement for a total maximum performance increase of about 500x - when ATI's graphics cards are used. However, Pande conceded that the graphics may only be able to deliver a sustained 100 GFlops of speed.

Folding@Home Project

At this time, the beta client is limited to the X1900 series of graphics cards; the researcher said that X1800 cards will be supported soon. The group will also be leveraging the Playstation 3 with its powerful Cell processor. A client for the PS3 was shown already in August, but, according to Pande, that applications for the Cell aren't easy to program.

So, what about Nvidia cards? According to Pande, the group has not been able to get the software to work on Nvidia chips.

Andy Keane, general manager of visualization applications at Nvidia, said in response to the ATI/Stanford announcement that general processing graphics processing units (GPGPUs) so far have been "fundamentally flawed" in a sense that there has not been a lot of "commercial exploitation with GPUs as a processor."

He mentioned that Nvidia wants to change this situation and considers the GPGPU market as "exciting" and something that "the company has been looking at for years." He stated that he had no personal knowledge of the development of a Folding@Home client for the Nvidia platform, but stressed that the company has a "long-standing relationship with Stanford."

At least as far as we know, Nvidia cards were in fact used for general processing projects before ATI came into the picture. One of the early projects was the now defunct BionicFX, which used Geforce 6800 processors to accelerate audio processing. ATI publicly mentioned the possibility of using graphics processors for other applications than graphics shortly before the launch of the X1800 graphics cards series. Such an approach, which ATI called "load balancing" could one day run, for example, physics effects on consumer and enthusiast PCs. Nvidia outlined a similar approach earlier this year for future SLI systems.

However, the fact that graphics chips particularly excel in floating point performance, currently limits the general purpose use of the chips largely to scientific applications. More and more companies are entering this lucrative field and try to answer the need for more processing power. For example, startup Peakstream last week announced an application interface layer that enables developers to add graphics cards to computer systems as a way to create "cheap" supercomputers.

http://www.tgdaily.com/2006/09/29/folding_at_home_to_use_gpus/

http://www.anandtech.com/video/showdoc.aspx?i=2849

The GPU Advances: ATI's Stream Processing & Folding@Home
Date: Sep 30, 2006
Type: Video Card
Manufacturer: ATI
Author: Ryan Smith

Page 1 In the continual progression of GPU technology, we've seen GPUs become increasingly useful at generalized tasks as they have added flexibility for game designers to implement more customized and more expansive graphical effects. What started out as a simple fixed-function rendering process, where texture and vertex data were fed into a GPU and pixels were pushed out, has evolved into a system where a great deal of processing takes place inside the GPU. The modern GPU can be used to store and manipulate data in ways that goes far beyond just quickly figuring out what happens when multiple textures are mixed together.

What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.

Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.

Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.

Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.

It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.

For several years now, Dr. Vijay Pande of Stanford has been leading the Folding@Home project in order to research protein folding. Without diving unnecessarily into the biology of his research, as proteins are produced from their basic building blocks - amino acids - they must go through a folding process to achieve the right shape to perform their intended function. However, for numerous reasons protein folding can go wrong, and when it does it can cause various diseases as malformed proteins wreck havoc in the body.

Although Folding@Home's research involves multiple diseases, the primary disease they are focusing on at this point is Alzheimer's Disease, a brain-wasting condition affecting primarily older people where they slowly lose the ability to remember things and think clearly, eventually leading to death. As Alzheimer's is caused by malformed proteins impairing normal brain functions, understanding how exactly Alzheimer's occurs - and more importantly how to prevent and cure it - requires a better understanding on how proteins fold, why they fold incorrectly, and why malformed proteins cause even more proteins to fold incorrectly.

The biggest hurdle in this line of research is that it's very computing intensive: a single calculation can take 1 million days (that's over 2700 years) on a fast CPU. Coupled with this is the need to run multiple calculations in order to simulate the entire folding process, which can take upwards of several seconds. Even splitting this load among processors in a supercomputer, the process is still too computing intensive to complete in any reasonable amount of time; a processor will simulate 1 nanosecond of folding per day, and even if all grant money given out by the United States government was put towards buying supercomputers, it wouldn't even come close to being enough.

This is where the "@Home" portion of Folding@Home comes in. Needing even more computing power than they could hope to buy, the Folding@Home research team decided to try to spread processing to computers all throughout the world, in a process called distributed computing. Their hopes were that average computer users would be willing to donate spare/unused processor cycles to the Folding@Home project by running the Folding@Home client, which would grab small pieces of data from their central servers and return it upon completion.

The call for help was successful, as computer owners were more than willing to donate computer cycles to help with this research, and hopefully help in coming up with a way to cure diseases like Alzheimer's. Entire teams formed in a race to see who could get more processing done, including our own Team AnandTech, and the combined power of over two-hundred thousand CPUs resulted in the Folding@Home project netting over 200 Teraflops (one trillion Floating-point Operations Per Second) of sustained performance.

While this was a good enough start to do research, it was still ultimately falling short of the kind of power the Folding@Home research group needed to do the kind of long-runs they needed along side short-run research that the Folding@Home community could do. Additionally, as processors have recently hit a cap in terms of total speed in megahertz, AMD and Intel have been moving to multiple-core designs, which introduce scaling problems for the Folding@Home design and is not as effective as increasing clockspeeds.

Since CPUs were not growing at speeds satisfactory for the Folding@Home research group, and they were still well short of their goal in processing power, the focus has since returned to stream processors, and in turn GPUs. As we mentioned previously, the massive floating-point power of a GPU is well geared towards doing research work, and in the case of Folding@Home, they excel in exactly the kind of processing the project requires. To get more computing power, Folding@Home has now turned towards utilizing the power of the GPU.

Modern GPUs such as the R580 core powering ATI's X19xx series have upwards of 48 pixel shading units, designed to do exactly what the Folding@Home team requires. With help from ATI, the Folding@Home team has created a version of their client that can utilize ATI's X19xx GPUs with very impressive results. While we do not have the client in our hands quite yet, as it will not be released until Monday, the Folding@Home team is saying that the GPU-accelerated client is 20 to 40 times faster than their clients just using the CPU. Once we have the client in our hands, we'll put this to the test, but even a fraction of this number would represent a massive speedup.

With this kind of speedup, the Folding@Home research group is looking to finally be able to run simulations involving longer folding periods and more complex proteins that they couldn't run before, allowing them to research new proteins that were previously inaccessible. This implementation also allows them to finally do some research on their own, without requiring the entire world's help, by building a cluster of (relatively) cheap video cards to do research, something they've never been able to do before.

Unfortunately for home users, for the time being, the number of those who can help out by donating their GPU resources is rather limited. The first beta client to be released on Monday only works on ATI GPUs, and even then only works on single X19xx cards. The research group has indicated that they are hoping to expand this to CrossFire-enabled platforms soon, along with less-powerful ATI cards.

The situation for NVIDIA users however isn't as rosy, as while the research group would like to expand this to use the latest GeForce cards, their current attempts at implementing GPU-accelerated processing on those cards has shown that NVIDIA's cards are too slow compared to ATI's to be used. Whether this is due to a subtle architectural difference between the two, or if it's a result of ATI's greater emphasis on pixel shading with this generation of cards as compared to NVIDIA we're not sure, but Folding@Home won't be coming to NVIDIA cards as long as the research group can't solve the performance problem.

Conclusion

The Folding@Home project is the first of what ATI is hoping will be many projects and applications, both academic and commercial, that will be able to tap the power of GPUs. Given the results showcased by the Folding@Home project, the impact on the applications that would work well on a GPU could be huge. In the future we hope to be testing technologies such as GPU-accelerated physics processing for which both ATI and NVIDIA have promised support, and other yet to be announced applications that utilize stream processing techniques.

It's been a longer wait than we were hoping for, but we're finally seeing the power of the GPU unleashed as was promised so long ago, starting with Folding@Home. As GPUs continue to grow in abilities and power, it should come as no surprise that ATI, NVIDIA, and their CPU-producing counterparts are looking at how to better connect GPUs and other such coprocessors to the CPU in order to further enable this kind of processing and boost its performance. As we see AMD's Torrenza technology and Intel's competing Geneseo technology implemented in computer designs, we'll no doubt see more applications make use of the GPU, in what could be one of the biggest-single performance improvements in years. The GPU is not just for graphics any more.

As for our readers interested in trying out the Folding@Home research group's efforts in GPU acceleration and contributing towards understanding and finding a cure for Alzheimer's, the first GPU beta client is scheduled to be released on Monday. For more information on Folding@Home or how to use the client once it does come out, our Team AnandTech members over in our Distributed Computing forum will be more than happy to give a helping hand.