To: tortoise
Cache coherence isn't much of a problem. I designed methods around this for the Cray XMP's. On the other hand, memory latency is a big problem.
We used to joke that a Cray was about a $1,000,000 worth of fast memory, $1,000,000 worth of fast CPU and $28,000,000 worth of switches.
There are only three things needed to run at high speed: bandwidth, bandwidth, and bandwidth.
328 posted on
07/09/2003 12:38:20 PM PDT by
Doctor Stochastic
(Vegetabilisch = chaotisch is der Charakter der Modernen. - Friedrich Schlegel)
To: Doctor Stochastic
Cache coherence isn't much of a problem. I designed methods around this for the Cray XMP's. On the other hand, memory latency is a big problem. Latency is the big killer, but cache coherence is also a bit of a problem. The resource contention is so pervasive and fine-grained when having multiple processors simulating a vast number of parallel processes for this algorithm space that the processors are constantly stepping on each others cache. The way around this is by artificially partitioning the data space (not trivial) and using low-level process management APIs to tweak scheduling and affinity, but even that is a clever balancing act so that the cost of IPC doesn't kill the benefit of much better cache performance.
The nuisance of all of this is that making the code scale well on SMP or ccNUMA silicon makes the size of the codebase 2+ times larger versus a vanilla implementation and makes code tweaking much more tedious. The speed improvement is very significant though.
331 posted on
07/09/2003 2:00:24 PM PDT by
tortoise
(All these moments lost in time, like tears in the rain.)
To: Doctor Stochastic
But the Cray architecture is vector. (Now you are going to tell me what that means exactly)
338 posted on
07/09/2003 6:29:08 PM PDT by
js1138
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson