Posted on 08/23/2003 7:55:14 AM PDT by justlurking
For immediate release, August 22, 2003, Lexington, Kentucky: Researchers at the University of Kentucky have constructed and demonstrated an innovative new, scalable, parallel supercomputer that achieves application performance of more than 1 billion floating point operations per second (GFLOPS) for every $100 spent on building the machine. The approach used to design and build this machine makes it cost-effective for solving a wide range of problems, from drug design using computational chemistry to design of quieter printers using computational fluid dynamics (CFD). Thus, this breakthrough is not only a milestone, but also will enable many more scientists and engineers to use computational models.
A decade ago, supercomputers cost about a $1,000,000 per GFLOPS performance. By using standard PC parts, "Beowulf" cluster supercomputers dramatically reduce the cost, but as processors and other components have become faster and cheaper, the network needed to coordinate them has become relatively expensive. The University of Kentucky researchers made their first breakthrough in reducing network cost in May 2000, when KLAT2, Kentucky Linux Athlon Testbed 2 (http://aggregate.org/KLAT2/) used standard 100mb/s Fast Ethernet hardware in the world's first machine-designed asymmetric cluster network -- and achieved $640 per GFLOPS, breaking the $1,000 per GFLOPS barrier. Their newest machine, KASY0, Kentucky Asymmetric Zero (http://aggregate.org/KASY0/), uses a more advanced type of asymmetric network design to break the $100 per GFLOPS barrier.
A well-known reference for supercomputer performance is http://top500.org/, which lists the 500 supercomputers that obtain the highest GFLOPS speed executing a Linpack benchmark program. Performance on that program depends partly on the theoretical peak GFLOPS of the processors, but also on the parallel implementation and efficiency of the network that allows the processors to work together. In the current (June 2003) list, most systems use expensive, specialized, network hardware. The machines explicitly listed as using standard 100mb/s Fast Ethernet achieve an average of less than 8.5% of peak. The average for the systems listed as using Gigabit Ethernet is somewhat better, at about 30% of peak. In contrast, KASY0's 100mb/s Fast Ethernet network allows it to achieve 187.3 GFLOPS, over 35% of peak using a double-precision version of the benchmark (HPL). Using a single-precision version, the $39,454.31 KASY0 obtains over 471.5 GFLOPS, more than 44% of its theoretical peak and less than $84 per GFLOPS.
The remarkable thing about KASY0's price/performance is that, while network hardware is often the dominant cost for a system of its size (128 plus 4 spare nodes), less than 11% of the system cost went for the network hardware. The AMD Athlon XP 2600+ processors were more than 35% of the total system cost; memory was 21%. Even more significantly, the network design technology that made this possible can be applied with similar benefit to cluster supercomputers with thousands of nodes. KLAT2's network was the world's first Flat Neighborhood Network; the enhanced version used for KASY0 is the world's first
Sparse Flat Neighborhood Network (SFNN). KASY0 also is the first supercomputer to have its physical node and switch placement optimized by a computer program. FNN design technology and tools have been freely available and used by various other groups; so too will the new SFNN technology be freely available.
KASY0 is not a toy or a "hack" -- it is a serious demonstration of a fundamental new advance in network design. The only other supercomputer we have seen claim close to the price/performance measured for KASY0 is this $50,000+ system built by the National Center for Supercomputing Applications (NCSA) using 70 PlayStation2 units. Not only does KASY0 have a vastly superior network and significantly higher peak floating point performance per node, but KASY0's lower price yields many more nodes and real application performance, not just high peak numbers.
For example, KASY0 also has set a new world record for rendering a complex image using the Persistence of Vision Raytracer (POV-Ray). Executing pvmpovray 3.5 on KASY0 to render the standard
benchmark.pov scene yielded a time of 72 seconds. According to this site, the previous record was 107 seconds set on August 1, 2003 by a cluster costing $79,000.
The primary architect of KASY0 is Tim Mattox, a research assistant who has been developing the Sparse Flat Neighborhood Network concept for his Ph.D. thesis. As an educational experience available to anyone, the physical construction of KASY0 was done entirely by volunteers at the University of Kentucky.
From the creation of the first Linux PC cluster in February 1994 to the construction of KASY0, Hank Dietz and his students have continued to improve cluster performance by making compilers, hardware architecture, and operating system work together more efficiently. At the University of Kentucky, as Professor of Electrical and Computer Engineering and James F. Hardymon Chair in Networking, Dietz's goal is to develop and freely diseminate the new technologies that will allow scientists and engineers to solve their most important computational problems.
The following is the complete (or nearly so) parts list. It may be useful to compare this to our previous big system, KLAT2, whose pricing summary is given here. Notice that costs for all items must be tracked in order for us to justify our claims of setting a new price/performance record.
Subsystem | Description | Model/Part Number | Vendor/Source | Quantity | Delivered Price |
---|---|---|---|---|---|
Node | Athlon XP 2600+ Processor | Athlon XP 2600+ Retail (333MHz FSB) |
MonarchComputer.com | 128 | $13690.00 |
Node | Athlon XP 2600+ Processor | Athlon XP 2600+ Retail (333MHz FSB) |
Googlegear.com | 4 | $400.00 |
Node | 512MB PC2700 DDR SDRAM | Crucial CT6464Z335 | MWave.Com | 132 | $8316.00 |
Node | Athlon XP Motherboard | BioStar M7VIT Pro | MWave.Com | 132 | $6996.00 |
Node | Node case + 400W power supply | 6042L Codegen | 4GoldenBridge.Com | 64 | $2462.00 |
Node | Node case + 400W power supply | 6042L Codegen | PineComputer.Com | 68 | $2380.00 |
Nodes Subtotal | $34244.00 | ||||
Network | Fast Ethernet NIC | Linksys LNE100TX | AlanComputech.Com | 280 | $2082.00 |
Network | 24-port Fast Ethernet Switch | BenQ SE0024 | NewEgg.Com | 6 | $432.00 |
Network | 24-port Fast Ethernet Switch | BenQ SE0024 | AudioExchange.Com | 10 | $772.03 |
Network | 24-port Fast Ethernet Switch | BenQ SE0024 | existing equipment | 2 | $152.00 |
Network | Cat5e 15-foot Cable (9 colors) | CBLC515 | LanAdapters.Com | 450 | $807.95 |
Network Subtotal | $4245.98 | ||||
Support | 6-Shelf Commercial Chrome Rack | Sku# 831725 | SamsClub.Com | 6 | $548.60 |
Support | 20" Box Fan | Lakewood Model 202 | WalMart.Com | 2 | $22.09 |
Support | Surge Protector Power Strip | Statitec 6-outlet Sku# 808571 | SamsClub.Com | 1 (32-pack) | $173.64 |
Support | Materials for power mounts | BC Plywood, 2x4s, 7/8" dowels, glue, screws, paint | local places/stock | $20.00 | |
Assembly | Food for student helpers | 4 dozen Panera Bagels, 11 large Papa John's Pizzas, 5 cases assorted soft drinks, 1 case party mix, 1 case Grandma's cookies, 2 cases miniture cheesecakes | local places | $200.00 | |
Misc. Subtotal | $964.33 | ||||
Total | $39454.31 |
We do not include the Nikon 950 camera which we have mounted in the cluster because it is completely unrelated to the cluster's operation, serving as a webcam and security monitor for the entire lab. Neither does the cost above include a firewall or a "head node," because the entire lab is behind an old PC used as a firewall and KASY0's configuration allows any of a number of existing machines to serve as "head nodes" for different purposes (i.e., KASY0 is a cluster of peer nodes). If we were to use included spare hardware for the firewall and head node, the only additional cost would be less than $100 for a disk drive.
The assembly cost might seem low, but we easily could have obtained comparable-performance assembled systems for similar pricing. In a university setting, it is simply more appropriate to give students the experience of building the systems and, as a side benefit, we get better control over the precise choice of components used and how they are assembled. For example, each case came with two side fans, which we converted into a redundant stack venting out the back. We also took a cost hit on doing our own assembly in a variety of ways; for example, shipping is higher for parts than for assembled systems. Another cost hit cam because we bought just 4 processors to test assemble systems with and only then ordered the other 128 -- had we ordered everything at once (as we would have for assembled systems), all the processors would have been purchased before a 7% price increase hit. We didn't cut corners on anything; note that we counted spares in the cost, the cases have 400W power supplies, the processors have full warranty retail packages, and even the power strips came with surge protection and full insurance for the protected equipment.
Thus, KASY0's "street cost" is under $40,000 by any accounting. In comparison, KLAT2's "street cost" was $41,205 for just 64+2 nodes, each of which was about 1/3 the speed of a KASY0 node. The memory size is also 4x per node, 8x total. Network latency typically will be identical, with total bandwidth about 1.5x that of KLAT2. The accounting of the network cost is somewhat debatable in that each node motherboard contains one built-in NIC; we counted that NIC as part of the node cost, not network cost, because the board isn't available without the NIC. Even if we had ignored the built-in NIC and purchased more NICs, the network on KASY0 is close to half the cost of that on KLAT2 -- after all, it even uses narrower switches than KLAT2 did: 24-port vs. 32-port switches.
An even cuter comparison is with this, a $50,000+ system built using 70 PlayStation2 units. Not only does KASY0 have a vastly superior network and significantly higher floating point performance per node (8 GFLOPS vs. 6.5 GFLOPS for the PS2), but we get LOTS more nodes!
For those interested in how this compares to UK's SDX HP Superdome, the quick answer is that the two machines are very different, but have roughly comparable performance. The HP is a vendor-packaged system with more processors (224 total: 3x64 + 1x32 750MHz HP PA-RISC 8700), more memory (448GB), and a higher double-precision speed (672 GFLOPS peak, 431.7 GFLOPS Linpack). On the other hand, KASY0's integer and single-precision speeds are faster (e.g., 1062.4 GFLOPS peak, 471.5 GFLOPS Linpack), it is a homogeneous system (not a cluster of different-sized shared memory systems), and power consumption is several times lower. Oh yeah: KASY0 also is much cheaper!
Check out the last item in the cost breakdown. Do you think someone brought something beyond the typical techgeek diet, but didn't include it?
KLAT2's cost is somewhat difficult to specify precisely because the most expensive components, the Athlon processors, were donated by their manufacturer, AMD (Advanced Micro Devices). Here, we quote the retail price for these processors as found on Multiwave's WWW site on May 3, 2000. Similarly, although most applications use only 64 nodes, KLAT2 also has 2 "hot spare" nodes and an additional switch layer that are used for fault tolerance and system-level I/O; because we consider these components to be an integral part of KLAT2's design, we include their cost. We also included 16 spare NICs and several spare surge protectors. Due to University of Kentucky purchasing guidelines and part stocking issues, purchases from the same vendor were sometimes split in odd ways and there were various inconsistencies about how shipping was charged; although the vendor totals are correct, we have had to approximate the component cost breakdown in these cases.
The following table details the cost of KLAT2. Although specific vendors are listed, note that being listed here should not be taken as an implicit endorsement by the authors or by the University of Kentucky. Aside from donation of the Athlons, there were no exceptional discounts or other arrangements with any of the vendors.
Vendor and Part Descriptions | Cost |
---|---|
AMD 66 Donated 700MHz Athlon OEM processor modules @ ~$200 |
$13,200 |
MemoryX 66 128MB PC100 CAS2 SDRAMs @ $93 |
$6,182 |
Technology Partners 66 Polaris II ATX Mid Towers 300W @ $58 66 Sony 1.44MB Floppy Drives (for net boot) @ $11 |
$5,455 |
Multiwave Technology 66 FIC SD11 Motherboards @ $104 10 Smartlink 32-port wire-speed 100Mb/s switches @ $527 28 Smartlink 100Mb/s NIC 10-packs @ $80 |
$14,338 |
Buy.Com 32 Hawking 15' color-coded Cat.5 cable 5-packs @ $9 32 Hawking 15' transparent color-coded Cat.5e cable 5-packs @ $12 |
$706 |
Coolerstar 66 AMD K7/PII Dual Fans (CPU heat sinks & fans) @ $5 66 DC Fans 80mm (extra case fans) @ $4 |
$607 |
Lowe's 4 48"x18"x72" black wire-frame shelves @ $64 |
$256 |
Wal-mart 2 WindDance fans (to direct airflow between shelves) @ $15 20 Surgestrip model 201 surge protectors @ $4 |
$109 |
Various local stores 16 3" diameter threaded-mount wheels for shelves @ $9 16 Pizzas for student helpers @ $10 4 Cases of soda student helpers @ $7 4 2" diameter threaded-mount wheels for rack @ $7 |
$352 |
Available at no cost/indirectly used items 1 10-year-old rack & mounting hardware 1 Surplus 17" monitor used for cluster status 1 Old PCI video card used for cluster status 1 18GB EIDE disk drive 66 Set of inkjet-printed labels for each node 2 Other clusters for KLAT2's HW design and SW development |
|
Total | $41,205 |
In summary, KLAT2's total value is about $41,200, with the primary costs being roughly $13,200 in processors, $8,100 in the network, $6,900 in motherboards, and $6,200 in memory.
Kentucky is better by far.
Whaaaat? They're not running XP Home Edition?
For chuckles, consider the cost if they'd had to include the cost of MS software licences for each CPU. The humanity!
No, there is no floppy. I believe the MSI motherboard BIOS can be configured to boot from a network address: I built a system with a different MSI motherboard earlier this year for someone else and noticed it the functionality.
From what I've been able to find, the subsequent steps are to get an IP address from a DHCP server, then download the kernel from a TFTP server. From there, it's clear sailing.
I may try that to keep my main box cool!
5.56mm
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.