Intel Solution #1: Xeon and Physical Address Extensions (PAE) Many of our readers are hardware veterans and will point out that the current 32-bit Xeon CPUs from Intel are not limited to 4 GB of memory. Indeed, Intel's latest E7500 chipset for the Pentium 4 Xeon supports up to 16 GB of RAM, and the slightly older Profusion chipset for the Pentium III XEON supports up to 32 GB of RAM. Intel's Xeon CPUs feature Physical Address Extensions (PAE) which can use a 36-bit address bus. So, in theory a Xeon (or any current Intel CPU for that matter) can access up to 64 GB. So the problem is solved, no point of migrating to a new 64 bit platform and investing in new software? The first problem is that you need proper software support to access more than 4 GB on a 32-bit PAE CPU. First of all you need Windows 2000 Advanced Server, which can address up to 8 GB, or the extremely expensive Windows 2000 Datacenter for up to 64 GB. Then you also need software that makes use of Microsoft's Address Windowing Extensions (AWE) API. Only then you can use more than 4 GB of memory. By the way the Linux kernel 2.4 also supports PAE and thus more than 4 GB of memory total, but each process can only use 4 GB of memory. So, let investigate Windows AWE a bit more. AWE works with an AWE window that exists within the 4 GB address space. As 32-bit CPUs can only address 4 GB of memory at any given time, every time you need something that the OS has put above the 4 GB RAM limit, the AWE window needs to be remapped to the location where that data is stored.
As you can see in the model above, AWE comes with a lot of overhead. As Windows has to keep track of the pages of the memory, such mapping of AWE memory on the AWE window is a very memory intensive and slow operation that doesn't take a few nanoseconds, but tens of microseconds! So, if you have to access a lot of different locations in the memory above 4 GB, a lot of remapping and book keeping must be done. AWE memory might still be quite a bit faster than accessing the hard disk, but it is 10 to 100 times slower than normal memory use. Optimizations in an application to improve data locality can minimize the need to shuffle AWE windows around too much, therefore improving performance with AWE memory to some degree. The result is that, for the most part, AWE memory is only interesting for caching databases, to avoid accessing the disk system. But a workstation user who needs to work fluidly and efficiently with massive datasets has no use for it. So, for high end workstations users, the 32-bit Xeon with or without PAE looks much less attractive than AMD's Opteron. Even without 64-bit x86-64 workstation applications, AMD's flagship CPU should do very well as long as the x86-64 version of Windows arrives in a timely fashion. In that case, Hammer users will be able to assign 4 GB to their favorite applications (and use the rest for the OS and other applications), while Xeon users will still be limited to 2-3 GB per application. Intel Solution #2: Deerfield
Long before marketing decided to name Intel's IA-64 processor "Itanium," the IA-64 project was codenamed "P7." If you consider that the Pentium Pro had codename "P6" and Willamette was codenamed "P68," it is clear that years ago Intel thought IA-64 would take over around the time that 64-bit addressing was necessary to compete in the workstation, servers and even desktop markets. Looking at the massive Itanium modules, with their rather mediocre performance, it is pretty hard to imagine that Intel expects IA-64 will take over from 32-bit x86 in the near future. |
 Dual Itanium - a massive module
Right now it seems that IA-64 is a total failure. Today this is true from a commercial point of view, but from a technical point of view, Itanium still has merits.  The Itanium chip itself is only 25 million transistors, you can see the separate L3-cache chips above
Yes, in total, an Itanium module features 325 million transistors and at 130 Watts, it is a power guzzling beast. Nevertheless, it must be noted that the Itanium core itself (including 32 KB L1 and 96 KB L2 caches) features only 25 million transistors, while the four L3 cache chips are good for 75 million transistors each. In other words, IA-64 has kept at least one promise: it saves transistors in the decoding and scheduling part, and should theoretically offer better IPC by using those transistors for more registers and execution units. But IA-64's first implementation in Itanium seems to be a failure in achieving significantly higher IPC, as Itanium delivers less performance per clock than essentially all of the RISC competitors it seeks to replace, according to SPECint2000. Indeed, in terms of performance per clock, the Itanium is behind the MIPS R1x000 series, HP's PA-RISC 8x000 chips, IBM's POWER3 and POWER4 architectures, Fujitsu's SPARC64 GP, the Alpha 21264, Sun's UltraSPARC II and III, and Intel's own Pentium III. IPC is only one factor in overall performance, however, but while Itanium does pull ahead of the x86 world's best in SPECint/GHz, the Pentium 4 and Athlon, it cannot match the clockrate of either chip. Itanium/Merced is just one implementation of IA-64, and all indications are that Itanium 2, aka McKinley, will be significantly improved. The last we spoke with Intel, they pointed towards Deerfield as Intel's upcoming 64-bit workstation CPU. Deerfield is a low-cost version of Madison, the 0.13µ version of Intel's improved IA-64 McKinley processor. Essentially, it is a 0.13µ McKinley, while Madison increases the on-chip cache sizes. Mike Magee of the Inquirer reported that Deerfield would already be launched in second quarter of 2003:
"Madison and Deerfield are slated for Q2 of next year, with 3MB and 4MB caches respectively, and again using the E8870 chipset."
In other words, Deerfield could be a significant, though slightly late and more expensive competitor to AMD's Opteron.
As Deerfield is based on the McKinley core, which offers much higher performance than Itanium. According to a report at the Inquirer, the 1 GHz Itanium 2 ("Mckinley") outperforms the current 800 MHz Itanium by 90% in both SpecInt and SpecFP (760 vs 400, 1350 vs 701). Intel itself, meanwhile, indicates that McKinley will deliver 70% better performance in SPECint2000 and 75% better performance in SPECfp2000 (see page 6 of this document for details). The biggest problem for Intel, however, will not be performance, but getting the major ISVs to produce IA-64 versions of their software. At this point of time, we have yet to see one major workstation application for the Itanium, as Intel's IA-64 CPU has been positioned towards the high-end server market for the most part. Intel Solution #3: Prescott Prescott is Intel's next-generation Pentium 4 processor line, which will have close to 100 million transistors and which will be produced on a 90-nanometer process. The high-end desktop chip (still 478 pin) has been rumored to include "Yamhill" technology, a sort of "Intel x86-64." Our sources confirmed that Prescott does indeed have a lot more cache on board - a logical evolution after Northwood - most likely a larger 1 MB L2 cache. We also received some confirmation that Prescott has quite a few architectural improvements, but no indication of x86-64 support so far. Apparently, Intel has studied x86-64, but canceled the project. It is still possible that the x86-64 extensions are in the chip, but not activated. Two independent sources on the Internet seem to confirm this. First of all, the Inquirer reported that Paul Otellini underlined the importance of IA-64:
"A REPORT QUOTED senior Intel executive Paul Otellini as saying the firm would not produce a 64-bit backward processor compatible with 32-bit code. Paul Otellini, speaking at a meeting in New York earlier today, said Intel's future was firmly in the Itanium camp and he confirmed earlier INQUIRER reports that Madison is slated for next year and will include 3MB and 6MB caches."
Secondly, an online CV of an Intel Engineer, Andy Glew says:
"IA32+ - yet another canceled project, which proposed to extend the IA32/x86 instruction set to 64 bits. I worked on 64 bit page tables, obtaining a patent on variable page size encoding."
This is not really surprising. After all, an Intel version of x86-64 would be devastating for IA-64 software support. Why would any ISV invest large amounts of money in a IA-64 software development if there is a much easier alternative to develop for. As our CEBIT report indicated, Intel wants to bring the HyperThreading technology to the home desktop, and Prescott will be the first desktop processor which will improve performance with HyperTreading. Also, Prescott will feature a 667 MHz FSB around Q2 2003, which will be fed by dual channel PC2700 DDR SDRAM. With the evidence we have today, we must conclude Prescott will still face the 4 GB limit (without PAE), and as such is not a solution for those workstation users who need more than 4 GB. Prescott is a very dangerous competitor to AMD's desktop Clawhammer, and Nocona (the "Xeon Prescott") for; AMD's Opteron, but it still leaves a window of opportunity open for the Opteron in the high-end workstation market which will need more than the 3 GB RAM process space that the 32-bit versions of Windows (XP) can offer. Intel Solution #4: Tejas Little is known about Tejas, which is the next incarnation of the Pentium 4 line. Tejas seems to be scheduled to launch in the first half of 2004. At that time, the majority of workstation users will probably need more than 4 GB, so Intel needs to do something to break the 4 GB barrier. Relying on Deerfield alone seems very risky. At first I could not believe my eyes when some of our sources indicated IA-64 support within Tejas! But then I read the CV of Andy Glew, the Intel Engineer, a bit further:
"Tejas - evaluated support for IA64 within a mainstream IA32 processor."
Tejas is too far off to speculate whether or not Intel would integrate IA-64 in a 32-bit x86 CPU, let alone how. But it is clear that Intel is very serious about migrating to IA-64. Furthermore, Tejas is rumored to feature a faster FSB (1200 MHz?). However, considering all the differences between x86 and IA-64, the chances of a true hybrid processor seem remote at best. Conclusion Considering that Intel is on the verge of launching the Itanium 2 and has another four IA-64 cores in the pipeline (Madison, Deerfield, Montecito (90 nm), Chivano (90 nm)), it is very unlikely that Intel would kill off ISV investments in IA-64 software by introducing their own version of x86-64. In the high-end server market, Intel's Itanium 2 should start to pick up, as it performs pretty well, and software support should begin to improve. The original Itanium hasn't exactly paved the way for its successor, though, so Itanium 2 will still be starting from effectively ground zero in terms of software support and userbase. Meanwhile, AMD's Opteron will probably only gain marketshare slowly, as this market is less susceptible to performance but more to software support, reputation, perceived reliability and the force of habit ("nobody gets fired for choosing Intel based servers"). Let us not forget that by volume, 89% of all servers ship with Intel CPUs! And it remains to be seen how quickly the big database vendors (Oracle, Sybase, Microsoft) will produce fully 64-bit x86-64 versions for the Opteron. However, in the workstation market, the Opteron can be a very effective weapon. As we have reported before, the Opteron's platform is very scalable and the hypertransport is a very elegant way of interconnecting the ASICs making motherboard very flexible and less expensive. Running a 64-bit version of Windows, the Opteron can offer 4 GB to each 32-bit process without any performance hit. Being significantly improved over the Athlon MP, we expect the Opteron to perform exceptionally well in workstation applications, and these two advantages might increase AMD's popularity in the workstation market. In the longer term, this might encourage workstation ISVs to develop and launch x86-64 versions of their software. To counter the threat of Opteron, Intel's will most likely opt to push Deerfield in the high-end workstation market, while a Xeon version of Tejas could smooth the migration from 32-bit x86 to 64-bit IA-64. Nevertheless, there is an important Window of opportunity for AMD in 2003. This wormhole to the workstation galaxy will probably collapse in 2004 when Tejas enters the market, and Intel has gathered enough software support for Deerfield. For the immediate future, however, we have Itanium 2 to look forward too, and beyond that in 2003 is Opteron. We'll be taking a look at both in the third-part of Chris' Volume Multi-Processor Systems series. If you aren't familiar with it yet, you may want to read over Part 1 and Part 2. All Content is Copyright (C) 1998-2002 Ace's Hardware. All Rights Reserved.
|