Replies

Thoughts on the NVidia vs ATI religious war, from the F@H forum:

If we look at worst case scenario figures, comparing like for like:

nVidia G92 (full version, not cut down GT/GS versions) = 128 shaders (unified - each shader can execute all 5 instruction classes)
ATI R700 = 800 shaders (specialized - each shader can execute one of the 5 instruction classes, which makes it equivalent (worst case scenario, assuming the workload is unparallelizable or the compiler is totally useless of taking advantage of this) to 160 shaders).

So: nVidia G92 = 128 shaders, ATI R700 = 160 shaders. nVidia G200 = 240 shaders.

Assuming all optimizations and compilers are of equal performance and ATI GPU cannot gain advantage from the compiler aligning parts of the execution out-of-order to use all available shaders all the time, this should roughly be the ratio of the performance between these GPUs, clock-for-clock. In reality, ATI's 800 specialized shaders should put it at least 2x ahead of the G200 because the unified equivalence is reached only in the worst case scenario where effectively only one same instruction is used repeatedly (unlikely in the extreme for any plausibly useful work).

Personally, I think we're all likely to die of old age before ATI make a compiler decent enough to leverage their GPUs to anything like their true potential - or for that matter, by the current evidence, even to the worst case scenario potential, as mentioned above.

Another thing worth considering is that nVidia are getting much higher clock speeds out of their 55nm chips (G92b in the GTX+ card) than ATI are getting out of theirs in the R700. Vertical scalability (clock speeds) is linear, whereas horizontal scalability (more pipelines / greater parallelism) is logarithmic - another fact that works against the R700. nVidia's brute-force approach is easier to optimize for and higher clock speeds provide better performance scaling.

The ATI 4870 has tremendous potential which will become realized as the Pande group writes better cores for the GPU utilization. Today it is a toss up in F@H production values between NVidia and ATI.

Of course, both cores are getting bonus points for their beta status so comparisons are difficult.

Our best guess is that either of the very high end GPU cards should provide about 110 GFLOPs of crunching prowess. That's a lot of power, and compare that to the recent past.

Just for fun, I looked at the Top 500 Supercomputer list from June 2005.

http://www.top500.org/list/2005/06/100

A single high end GPU available at a retail store in the US costing between $200 and $400 can produce enough output to place 2nd on this June 2005 list of supercomputers.

The fastest at the time as the BlueGene/L at Lawrence Livermore clocking in at 136.80 Linpack GFlops. It has been upgraded since then and still retains the #2 spot. The single GPU also uses less power than the BlueGene/L.

We all know that there is no direct comparison possible or desired between the two cases. Just remember the following two points:

1. My comparison makes much more sense than virtually any Olympic judging in this years' Games, and ...

2. I am in Sales and Marketing.

So there, just for fun.