Counting cores -- the multi-core low-down

Counting cores -- the multi-core low-down
Techworld ^ | 6/27/2005 | Russell Kay and Patrick Thibodeau

Posted on 07/25/2005 4:53:00 PM PDT by Flightdeck

In 1965, when he first set out what we now call Moore's Law, Gordon Moore (who later co-founded Intel) said the number of components that could be packed onto an integrated circuit would double every year or so (later amended to 18 months -- and recently amended again.

In 1971, Intel's 4004 CPU had 2,300 transistors. In 1982, the 80286 debuted with 134,000 transistors. Now, run-of-the-mill CPUs count upward of 200 million transistors, and Intel is scheduled to release a processor with 1.7 billion transistors for later this year.

For years, such progress in CPUs was clearly predictable: Successive generations of semiconductor technology gave us bigger, more powerful processors on ever-thinner silicon substrates operating at increasing clock speeds. These smaller, faster transistors use less electricity, too.

But there's a catch. It turns out that as operating voltages get lower, a significant amount of electricity simply leaks away and ends up generating excessive heat, requiring much more attention to processor cooling and limiting the potential speed advance -- think of this as a thermal barrier.

To break through that barrier, processor makers are adopting a new strategy, packing two or more complete, independent processor cores, or CPUs, onto a single chip. This multi-core processor plugs directly into a single socket on the motherboard, and the operating system sees each of the execution cores as a discrete logical processor that is independently controllable. Having two separate CPUs allows each one to run somewhat slower, and thus cooler, and still improve overall throughput for the machine in most cases.

Designed for speed From one perspective, this is merely an extension of the design thinking that has for several years given us n-way servers using two or more standard CPUs; we're simply making the packaging smaller and the integration more complete. In practice, however, this multi-core strategy represents a major shift in processor architecture that will quickly pervade the computing industry. Having two CPUs on the same chip rather than plugged into two separate sockets greatly speeds communication between them and cuts waiting time.

The first multi-core CPU from Intel is already on the market. By the end of 2006, Intel expects multi-core processors to make up 40 per cent of new desktops, 70 per cent of mobile CPUs and a whopping 85 per cent of all server processors that it ships. Intel has said that all of its future CPU designs will be multi-core. Intel's major competitors -- including AMD, Sun Microsystems and IBM -- each appear to be betting the farm on multi-core processors.

Besides running cooler and faster, multi-core processors are especially well suited to tasks that have operations that can be divided up into separate threads and run in parallel. On a dual-core CPU, software that can use multiple threads, such as database queries and graphics rendering, can run almost 100 per cent faster than it can on a single-CPU chip.

However, many applications that process in a linear fashion, including communications, backup and some types of numerical computation, won't benefit as much and might even run slower on a dual-core processor than on a faster single-core CPU.

Power and performance Two users who have tested AMD's Opteron dual-core chips and moved them into production say they are getting performance that's close to double the processing performance of a single chip.

Neal Tisdale, vice president for research and development at NewEnergy Associates, an Atlanta-based firm that conducts intensive analytical testing for the natural gas industry, has been using the Opteron dual-core chips supplied on systems built by Sun.

Tisdale says Sun is putting in an address decoder for each CPU, which increases throughput on his four-way machines. Address decoding helps a CPU access memory more efficiently.

But some vendors limit the number of address decoders on the chip, and that crimps performance, says Tisdale. "It actually depends what [server] vendor you buy from as to how much dual-core does for you," he says.

Another industry that sees chip performance as a competitive edge is travel. Customers want to choose from hundreds of flight and hotel options when booking travel online, and it's the system's task to deliver them quickly, says Alan Walker, vice president of technology prototyping and integration at Sabre Holdings. The Southlake, Texas-based company operates Travelocity and other online travel-booking services.

"You can never be fast enough or cheap enough for this type of processing," says Walker. Sabre has always used Opteron chips and began testing the dual-core versions on HP's ProLiant servers, which were shipped to its outsourcer, Electronic Data Systems.

The IT team spent a week testing the system and put it into production two days before AMD officially released the chip in April. "There were no installation issues," Walker says. "They put the same system image on the dual-core machine, and it booted without any problems."

Walker says he anticipates a bright future for dual-core Opteron chips, as long as the software scales reliably across a large number of systems. "These dual-core Opterons are way ahead of anyone else," he says.

TOPICS: Business/Economy; Extended News; Miscellaneous; Technical
KEYWORDS: athlon64; multicore; multiprocessor

Navigation: use the links below to view more comments.
first 1-20, 21-25 next last

Okay this article is nearly a month old. So my ulterior motive for posting it is this: I am in the market for a new desktop. I run 3-D, transient, numerical calculations, heat transfer and electrodynamics, custom built in C++ usually. I have a 2000 processor system to run the full-size versions, but debug and run clips from home. So can all the PC nerds on FR tell me how much improvement a dual core 3.0 Ghz processor would give me over a single core 3.6 Ghz??? Thank you in advance.

1 posted on 07/25/2005 4:53:00 PM PDT by Flightdeck

[ Post Reply | Private Reply | View Replies]

To: Flightdeck

AMD is ahead of Intel on this one, their dual cores speak highspeed, Intels cores are saddled by a slower bus.

Be that as it may, it all depends on your applications, if you can recompile your apps with dual processors in mind, then you should expect to see about 2-2.5x speed improvement

2 posted on 07/25/2005 4:57:00 PM PDT by Paradox (Its a good thing that even when you dismiss the existence of God, he doesn't dismiss you.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Flightdeck

If the application is either multi-process or multi-threaded, then it "might" scale to as much as 90% of the Ghz; i.e., 3.0 X 2 X 90% = 5.4 Ghz.

On the other hand, it may fall flat on it's face if it isn't coded properly. We hear of "multi-threaded" applications which fail miserably due to improper attention to gating all the time.

3 posted on 07/25/2005 5:04:00 PM PDT by the_Watchman

[ Post Reply | Private Reply | To 1 | View Replies]

To: Flightdeck

http://techreport.com/reviews/2005q2/athlon64-x2/index.x?pg=1

A good discussion with numerous benchmarks.

4 posted on 07/25/2005 5:05:44 PM PDT by cabojoe

[ Post Reply | Private Reply | To 1 | View Replies]

To: Paradox

Are you trying to tell us that a Paradox runs 2.5 times faster than a single doc? :)

5 posted on 07/25/2005 5:09:37 PM PDT by the_Watchman

[ Post Reply | Private Reply | To 2 | View Replies]

To: Flightdeck

My single core compiles would run about 25% faster on two cores.

Your 3.0Ghz dual-core, will therefore do the total work of a 3.75Ghz chip, but will ACT like two 1.875 processors.

So the dual-core 3.0Ghz will get about 4% more work done in your multi-cpu aware apps than your 3.6Ghz single-core, but will also run much nicer, as in, when you are using one core to download, the other core can check your e-mail.

6 posted on 07/25/2005 5:14:58 PM PDT by ROTB

[ Post Reply | Private Reply | To 1 | View Replies]

To: ROTB

Compiles can be disk limited. Thus, the much lower than expected speedup.

It really depends on the application. If your application does a lot of physical disk I/O, then you could get a much better performance boost out of either more memory or faster disk technology.

7 posted on 07/25/2005 5:19:28 PM PDT by the_Watchman

[ Post Reply | Private Reply | To 6 | View Replies]

To: the_Watchman; ROTB; Paradox; cabojoe

Thanks for the replies. Watchman, I must not be paying proper attention to gating as you say, since I don't know what that is. But for your other point, I always dynamically allocate my variable to RAM, so I am not writing and retrieving from my disk. If I purchase the cheap Intel Pentium D, I will get 4.0 Gigs of RAM, which should be enough for most of my apps.

The other point is that my computations are easily parallelized by multiple processors: each grid point runs the same equation (except boundaries). So does that parallelism translate to dual core? Thanks again for the help

8 posted on 07/25/2005 5:34:16 PM PDT by Flightdeck (Like the turtle, science makes progress only with its neck out.)

[ Post Reply | Private Reply | To 7 | View Replies]

To: Terpfen; general_re; politicket

Pinging for helpful input.

9 posted on 07/25/2005 5:51:10 PM PDT by cabojoe

[ Post Reply | Private Reply | To 8 | View Replies]

To: Paradox

I doubt the "2.5x" number you suggest. That would be what I call "super-scalar" - getting more than a CPU's worth of processing from an added CPU.

If the application is able to parallel process, then the second CPU should get you total processing power 1.5 to 1.9+ times that of one CPU.

Since you say the application also works on very large systems, and since these sorts of numerical calculations are usually coded to scale nicely on large parallel systems, I suspect it parallelizes nicely, and will get you close to twice the performance on two CPUs as on one. So two 3.0 GHz CPUs are worth about one 5.9+ GHz CPU.

The primary limitation which might bring you up short here will be memory bandwidth. Push that as high as you can, most likely.

10 posted on 07/25/2005 6:27:03 PM PDT by ThePythonicCow (To err is human; to moo is bovine.)

[ Post Reply | Private Reply | To 2 | View Replies]

To: ThePythonicCow

I doubt the "2.5x" number you suggest. That would be what I call "super-scalar" - getting more than a CPU's worth of processing from an added CPU.

DOH! I dont know WHERE I got those numbers from, 2.5? LOL, its been a long day. I meant more like 1.5-1.75x the speed. After reading that review of the AMD dual core processors, I am thinking that you are probably right, perhaps up to 1.9x the speed, if the apps are compiled with multiprocs in mind. Thanks for the correction!

11 posted on 07/25/2005 6:43:42 PM PDT by Paradox (Its a good thing that even when you dismiss the existence of God, he doesn't dismiss you.)

[ Post Reply | Private Reply | To 10 | View Replies]

To: Flightdeck

Check out the new AMD CPUs coming this winter. I will hold out long enough to buy one of the new 5000+ CPUs.

http://www.theregister.co.uk/2005/07/25/amd_roadmap_05-06/

for a preview of what is coming soon.

12 posted on 07/25/2005 7:49:36 PM PDT by texas booster (Bless the legal immigrants!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Flightdeck

Are you implementing this application? If so, what language are you using? An application can "get by" by using thread safe data structures in languages like Java.

If you are not rolling your own, then are you using a public or commercial package?

13 posted on 07/25/2005 9:07:19 PM PDT by the_Watchman

[ Post Reply | Private Reply | To 8 | View Replies]

To: Flightdeck

Depende on what you are doing.

Gaming would be worse. Little better than a 3.0 single.

However the makers of gomes will jump on thie new technology as they always do. Thus in a couple of years you will be much better off with dual, quad or greater cores.

Now that intel is not going to scale nearly as well as an Athlon 64 or Opteron. Mostly because a dual core P4 is really just two P4's coupled together. The AMD solution is designed from the ground up to be multi-core. Thus communicates much better with the second core.

14 posted on 07/25/2005 9:41:11 PM PDT by ImphClinton (Four More Years Go Bush)

[ Post Reply | Private Reply | To 1 | View Replies]

To: the_Watchman; Paradox

Are you trying to tell us that a Paradox runs 2.5 times faster than a single doc? :)

There, fixed it.

15 posted on 07/25/2005 10:54:48 PM PDT by LibertarianInExile (Kelo, Grutter, and Roe all have to go. Will Roberts get us there--don't know. No more Souters.)

[ Post Reply | Private Reply | To 5 | View Replies]

To: the_Watchman; Flightdeck

80% seems a realistic improvement in many cases. Anyway, as you say, it all depends on the problem and the implementation - if the problem isn't parallelizable to begin with, or if there's a great deal of IPC overhead, you might very well be better served by a fast single processor. Or if the person writing the software doesn't know what they're doing, dual cores might not show much of an improvement ;)

16 posted on 07/26/2005 6:10:27 AM PDT by general_re ("Frantic orthodoxy is never rooted in faith, but in doubt." - Reinhold Niebuhr)

[ Post Reply | Private Reply | To 3 | View Replies]

To: texas booster

So will a 2.6GHz dual-core Athlon X2, possibly with a 5000+ rating, according to AMD roadmaps seen by AnandTech.

Holy crap! That "5000+" sounds like it's only applicable to apps expressly written for dual-core chips, though. Anyone know if I'm wrong on that one?

17 posted on 07/26/2005 6:17:08 AM PDT by Future Snake Eater (The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work.)

[ Post Reply | Private Reply | To 12 | View Replies]

To: the_Watchman

"Are you implementing this application?"

Yeah. Writing it in C++ and compiling it using Borland at home. The mainframe uses a different compiler, but I don't think it matters.

18 posted on 07/26/2005 6:40:23 AM PDT by Flightdeck (Like the turtle, science makes progress only with its neck out.)

[ Post Reply | Private Reply | To 13 | View Replies]

To: general_re

"Or if the person writing the software doesn't know what they're doing, dual cores might not show much of an improvement ;)"

I am writing the software, but I'm not the most knowledgeable programmer; I'm more of a theoretician using a computer to get results. There are about a billion references out there, do you recommend one in particular for writing code for parallel computation?

19 posted on 07/26/2005 6:44:18 AM PDT by Flightdeck (Like the turtle, science makes progress only with its neck out.)

[ Post Reply | Private Reply | To 16 | View Replies]

To: Flightdeck

Honestly, I wouldn't feel comfortable pointing you one way or another - it'd be the blind leading the blind, so to speak ;)

Writing parallel applications, particularly massively parallel applications, is a fairly arcane art, and not one that I have much experience with. Hopefully someone who knows a bit about it will chime in, tho...

20 posted on 07/26/2005 6:53:53 AM PDT by general_re ("Frantic orthodoxy is never rooted in faith, but in doubt." - Reinhold Niebuhr)

[ Post Reply | Private Reply | To 19 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-25 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794