Oak Ridge goes gaga for Nvidia GPUs (Fermi is the name )

Oak Ridge goes gaga for Nvidia GPUs (Fermi is the name )
The Register ^ | 1st October 2009 22:06 GMT | Timothy Prickett Morgan

Posted on 10/02/2009 9:42:39 AM PDT by Ernest_at_the_Beach

Oak Ridge National Laboratories may not be the first customer that Nvidia will have for its new "Fermi" graphics processor, which was announced yesterday, but it will very likely be one of the largest customers.

Oak Ridge, one of the giant supercomputing centers managed and funded by the US Department of Energy to do all kinds of simulations and supercomputing design research, has committed to using the GPU co-processor variants of the Fermi chips, the kickers to the current Tesla GPU co-processors, in a future hybrid system that would have ten times the floating point oomph of the fastest supercomputer installed today.

Depending on the tests you want to use, the most powerful HPC box in the world is either the Roadrunner hybrid Opteron-Cell massively parallel custom blade box made by IBM for Los Alamos National Laboratory, or the Jaguar massively parallel XT5 machine at Oak Ridge, which uses only the Opterons to do calculations.

The Roadrunner machine relies on the Cell chips, which are themselves a kind of graphics processor with a single Power core linked into it, to do the heavy lifting on floating point calculations. The compute nodes in the Roadrunner are comprised of a two-socket blade server using dual-core Opteron processors running at 1.8GHz.

Advanced Micro Devices has six-core Istanbul Opterons in the field that are pressing up against the 3GHz performance barrier. But shifting to these faster x64 chips would not radically improve the overall performance of the Roadrunner machine.</p

Going faster miles an hour

Each Opteron blade uses HyperTransport links out to the PCI-Express bus to link to two dual-socket Cell blades.

(Excerpt) Read more at theregister.co.uk ...

TOPICS: Business/Economy; Computers/Internet; Science
KEYWORDS: fermi; hitech; nvidia

1 posted on 10/02/2009 9:42:40 AM PDT by Ernest_at_the_Beach

[ Post Reply | Private Reply | View Replies]

To: ShadowAce

Interesting article ...

2 posted on 10/02/2009 9:43:32 AM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 1 | View Replies]

To: rdb3; Calvinist_Dark_Lord; GodGunsandGuts; CyberCowboy777; Salo; Bobsat; JosephW; ...

3 posted on 10/02/2009 9:48:51 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ernest_at_the_Beach

4 posted on 10/02/2009 9:50:59 AM PDT by HangnJudge

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ernest_at_the_Beach

5 posted on 10/02/2009 9:51:56 AM PDT by HangnJudge

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ernest_at_the_Beach

http://www.nccs.gov/jaguar/

Petascale Computing on Jaguar
The National Center for Computational Sciences (NCCS), sponsored by the Department of Energy’s (DOE) Office of Science, manages the 1.64-petaflop Jaguar supercomputer for use by scientists and engineers solving problems of national and global importance. The new petaflops machine will make it possible to address some of the most challenging scientific problems in areas such as climate modeling, renewable energy, materials science, fusion and combustion. Annually, 80 percent of Jaguar’s resources are allocated through DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, a competitively selected, peer reviewed process open to researchers from universities, industry, government and non-profit organizations.

Through a close, four-year partnership between ORNL and Cray, Jaguar has delivered state-of-the-art computing capability to scientists and engineers from academia, national laboratories and industry. The XT system has grown in strength through a series of advances since being installed as a 25-teraflop XT3 in 2005. By early 2008 Jaguar was a 263-teraflop Cray XT4 able to solve some of the most challenging problems that could not be solved otherwise. In 2008 Jaguar was expanded with the addition of a 1.4-petaflop Cray XT5. The resulting system has over 181,000 processing cores connected internally with Cray’s Seastar2+ network. The XT4 and XT5 parts of Jaguar are combined into a single system using an InfiniBand network that links each piece to the Spider file system.

Throughout its series of upgrades, Jaguar has maintained a consistent programming model for the users. This programming model allows users to continue to evolve their existing codes rather than write new ones. Applications that ran on previous versions of Jaguar can be recompiled, tuned for efficiency, and then run on the new machine.

Jaguar is the most powerful computer system for science with world leading performance, more than three times the memory of any other computer, and world leading bandwidth to disks and networks. The AMD Opteron processor is a powerful, general purpose processor that uses the X86 instruction set which has a rich set of applications, compilers, and tools. Jaguar has hundreds of applications that have been ported and run on the Cray XT system, many of which have been scaled up to run on 25,000 to 150,000 cores. Jaguar is ready to take on the most challenging problems for the world.

6 posted on 10/02/2009 9:53:18 AM PDT by HangnJudge

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ernest_at_the_Beach

I suspect Oak Ridge will not be their only large customer.

25C3: Hackers completely break SSL using 200 PS3s

7 posted on 10/02/2009 9:56:09 AM PDT by cynwoody

[ Post Reply | Private Reply | To 1 | View Replies]

To: cynwoody

That doesn’t look like a basement operation....

8 posted on 10/02/2009 10:32:07 AM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 7 | View Replies]

To: All

related threads:

*************************************************

The Register ^ | 1st October 2009 22:06 GMT | Timothy Prickett Morgan

NVIDIA Takes GPU Computing to the Next Level

Thu 01 Oct 2009 10:48:34 AM PST · by Ernest_at_the_Beach · 13 replies · 273+ views
HPCwire ^ | September 29, 2009 | Michael Feldman, HPCwire Editor
Nvidia Puts On Graphic Power Display With Fermi ( Natl Labs interested )

Thu 01 Oct 2009 01:46:49 PM PST · by Ernest_at_the_Beach · 11 replies · 275+ views
Technews World ^ | 10/01/09 11:52 AM PT | Richard Adhikari
NVIDIA's Fermi: Architected for Tesla, 3 Billion Transistors in 2010

Thu 01 Oct 2009 09:17:53 AM PST · by Ernest_at_the_Beach · 43 replies · 643+ views
Anandtech ^ | September 30th, 2009 | Anand Lal Shimpi

9 posted on 10/02/2009 10:36:12 AM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 1 | View Replies]

To: All

Also from the Register:

Nvidia fires off Fermi, pledges radical new GPUs

*******************************EXCERPT*****************************

Three billion-transistor HPC chip, anyone?

By Tony Smith •

1st October 2009 11:18 GMT

Nvidia last night introduced the new GPU design that will feed into its next-generation GeForce graphics chips and Tesla GPGPU offerings but which the company also hopes will drive it ever deeper into general number crunching.

While the new chip is dubbed 'Fermi', so is the architecture that connects a multitude of what Nvidia calls a "Streaming Multiprocessor". The SM design the company outlined yesterday contains 32 basic Cuda cores - four times as many found in previous generations of SM - each comprising one integer and one floating-point maths unit. It is able to schedule two groups of 32 threads - a group Nvidia calls a "warp" - at once.

Nvidia's Fermi: each of the 16 green strips is...

The networked cores connect to 64KB of shared L1 cache, also used by four Special Function Units (SFUs) which handle complex maths formulae such as sines and cosines.

Fermi itself packs in 16 SMs - that's 512 Cuda cores in total - which tap into shared 768KB L2 cache and can reach out to a maximum of 6GB of GDDR 5 memory over a 384-bit interface and with ECC support

This is only the first Firmi GPU design. It's aimed at science and engineering GPGPU apps rather than game graphics, so future Fermi-based GeForce chips will likely sport less complex layouts. GT300, Nvidia's next GPU core, will be derived from Fermi, but don't expect it to show off all the superlatives Nvidia has been claiming for the Fermi chip.

...one of these 32-core Stream Multiprocessors

10 posted on 10/02/2009 10:41:12 AM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 9 | View Replies]

To: Ernest_at_the_Beach

Just imagining the next generation of this type of system that could be around as soon as next year. Think of a quad eight-core Xeon with four 32-core Larrabee PCIe cards (each Xeon running one card). That’s about 8.5 TFLOPs in a 3U using off-the-shelf parts, nothing customized. You could hit a petaFLOP in nine racks.

Yes, I know technically some video cards can easily beat this, but the range of problems they can be programmed to solve is much more limited (I learned that from Folding@Home).

11 posted on 10/02/2009 2:16:12 PM PDT by antiRepublicrat

[ Post Reply | Private Reply | To 1 | View Replies]

To: All

HardOCP has an Nvidia White paper available....:

NVIDIA's "Fermi" Architecture White Paper

12 posted on 10/02/2009 6:14:05 PM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 1 | View Replies]

To: antiRepublicrat

See link at #12.

13 posted on 10/02/2009 6:24:54 PM PDT by Ernest_at_the_Beach (Support Geert Wilders)

[ Post Reply | Private Reply | To 11 | View Replies]

To: Ernest_at_the_Beach

bump

14 posted on 10/02/2009 6:26:20 PM PDT by Captain Beyond (The Hammer of the gods! (Just a cool line from a Led Zep song))

[ Post Reply | Private Reply | To 1 | View Replies]

To: HangnJudge

That’s nice... but can it run a Java app? </sarc>

15 posted on 10/02/2009 9:00:31 PM PDT by DigitalVideoDude (It's amazing what you can accomplish when you don't care who gets the credit. -Ronald Reagan)

[ Post Reply | Private Reply | To 5 | View Replies]

To: Ernest_at_the_Beach

Right now I’m more interested in Larrabee. The biggest thing for me is that the Larrabee is full of general-purpose chips. It’s not just a huge collection of floating point units.

16 posted on 10/02/2009 11:07:14 PM PDT by antiRepublicrat

[ Post Reply | Private Reply | To 13 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794

Going faster miles an hour

NVIDIA Takes GPU Computing to the Next Level

Nvidia Puts On Graphic Power Display With Fermi ( Natl Labs interested )

NVIDIA's Fermi: Architected for Tesla, 3 Billion Transistors in 2010