Replies

Cache coherence isn't much of a problem. I designed methods around this for the Cray XMP's. On the other hand, memory latency is a big problem.

We used to joke that a Cray was about a $1,000,000 worth of fast memory, $1,000,000 worth of fast CPU and $28,000,000 worth of switches.

There are only three things needed to run at high speed: bandwidth, bandwidth, and bandwidth.

328 posted on 07/09/2003 12:38:20 PM PDT by Doctor Stochastic (Vegetabilisch = chaotisch is der Charakter der Modernen. - Friedrich Schlegel)

To: Doctor Stochastic

Cache coherence isn't much of a problem. I designed methods around this for the Cray XMP's. On the other hand, memory latency is a big problem.

Latency is the big killer, but cache coherence is also a bit of a problem. The resource contention is so pervasive and fine-grained when having multiple processors simulating a vast number of parallel processes for this algorithm space that the processors are constantly stepping on each others cache. The way around this is by artificially partitioning the data space (not trivial) and using low-level process management APIs to tweak scheduling and affinity, but even that is a clever balancing act so that the cost of IPC doesn't kill the benefit of much better cache performance.

The nuisance of all of this is that making the code scale well on SMP or ccNUMA silicon makes the size of the codebase 2+ times larger versus a vanilla implementation and makes code tweaking much more tedious. The speed improvement is very significant though.

331 posted on 07/09/2003 2:00:24 PM PDT by tortoise (All these moments lost in time, like tears in the rain.)

To: Doctor Stochastic

But the Cray architecture is vector. (Now you are going to tell me what that means exactly)

338 posted on 07/09/2003 6:29:08 PM PDT by js1138

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794