The Looming Battle in 64 Bit Land

By Paul DeMone, Updated: Jun 20, 2000

Clash of the Titans

The ongoing knife fight between Intel and AMD for mind share and sockets in the 32-bit x86 marketplace has been widely publicized. This is due in part to the fact that it will be providing millions of PC buyers with immediately tangible benefits from the rapid introduction of faster processors by both sides, along with the accompanying price cuts on existing parts. It is also due to the personal computer’s increasingly mainstream role in popular culture. With this wider exposure, Intel and AMD are sometimes put in the surrogate role of a favored home town sports team in the minds of vociferously partisan and vocal groups of fans and supporters on each side.

But there is another important, though lesser known, processor battle looming. It is in the realm of the expensive, power hungry, high pin-count 64-bit processors used in applications such as high end technical workstations and departmental and enterprise class servers. These 'big iron' processors sell in volumes two orders of magnitude lower than 32-bit x86 devices, and are often priced similarly to a an entire top-of-the-line PC. Nevertheless these prestigious, if somewhat esoteric, microprocessors drive the sales of tens of billions of dollars of large computer systems per year, which power much of our large corporations and institutions. The market for this high margin hardware will imminently undergo explosive growth to enable the widely anticipated internet economy of the future.

The Players

The high end computing market is currently the exclusive reserve of 64-bit RISC processor families such as the Compaq Alpha, SGI MIPS, HP PA-RISC, IBM Power, and Sun SPARC. Although Intel has had some success at driving its Xeon processor line into low end servers, the x86 instruction set architecture is 32 bits, while many of the large applications that run on high-end machines, such as database management systems (DBMS) and computer-aided design (CAD), demand the large flat address spaces in excess of 4 Gbytes that only 64 bit processors can provide. Intel and AMD both plan to tackle this market, but are going about it in quite different ways. For the last six years Intel and HP have been developing a new RISC-like architecture called IA-64 that includes a compatibility mode that allows it to also run x86 code. AMD is taking the bold and controversial step of extending the x86 architecture (once again!) into full '64-bit hood' while retaining the ability to freely run x86 legacy code.

If the current plans of some of the contestants are carried out, the 64-bit high end processor market will become a little less crowded within a few years despite the arrival of IA-64. SGI has announced that it intends to replace the MIPS RISC processors in its mid range and high end systems with Intel IA-64 processors. Curiously, both IBM and Compaq are planning the schizophrenic strategy of offering systems based on both IA-64 processors and their own respective RISC processors. HP is officially in the same camp as SGI in that they have announced the intention of replacing their PA-RISC family with IA-64. Unofficially however, HP seems to stand with IBM and Compaq because it has a publicly disclosed a development roadmap of future PA-RISC CPUs that stretches surprisingly far into the future. Sun Microsystems is the lone holdout against surrendering to, or even fraternizing with, IA-64. Although Sun has made some unconvincing noises about porting their Solaris operating system to IA-64, they have no plans to offer IA-64 based hardware. Sun seems prepared to ride the SPARC horse exclusively as far as it will go.

Currently Shipping 64 bit Processors

The approximate competitive positioning (based on integer and floating point performance) of currently available 64-bit high end microprocessors is shown in Figure 1. The performance of the Intel 'Coppermine' Pentium III 1.0 GHz x86 32-bit processor is also included as a reference point. The position of the current Sun high end SPARC processor is estimated because Sun has not yet seen fit to disclose SPEC2000 benchmark results for it.


Figure 1. Currently Shipping 64-bit RISC Processors (and high end x86)

Note that the Intel 32-bit x86 processor achieves respectably high integer performance due to its high clock rate afforded by the early access to 0.18 um process technology (nearly all the RISC processors shown are manufactured in processes in the 0.22 to 0.28 um range). Most of the flagship parts of the competing RISC processor families, along with the Merced/Itanium IA-64 processor, will be manufactured in 0.18 um technologies within the next six to twelve months.

Alpha's well on the Western (and Eastern) Front

The current performance leader is the Compaq Alpha EV67 (also known as the 21264A) running at 667 MHz. The Alpha design team has recently come through several difficult years due to the sale of their Hudson Mass. wafer fab to Intel (as part of a complex settlement of DEC's patent infringement suit against the chip giant), a subsequent acquisition of Digital Equipment Corporation (DEC) by Compaq and their own relocation from Hudson to Shrewsbury. This internal distraction, plus the complexity of the EV6 processor core design, has caused Alpha to fall out of its traditional position as clock rate leader. But the generous execution resources of the out-of-order EV6 design, coupled with the high bandwidth and low latency Tsunami chipset and mature compiler technology has allowed Alpha to retain performance leadership in SPEC2000, as well as SPECfp95, despite competition from processors with clock rates as much as 50% higher.

Within the next six months Compaq should be shipping the first 0.18 um version of the EV6 core, known as EV68 or 21264B. This device, disclosed at ISSCC 2000, is built in an 0.18 um process, but retains the die size and metal design rules from the 0.25 um EV67. This approach is for backwards compatibility with the existing 588 pin CPGA package and gets the EV68 into existing systems faster, but doesn't realize the full benefits from the shrink. The characteristics of the EV68 process are shown below in Table 1 along with the characteristics of the Intel P856.5 process (0.25 um with 5% shrink) and the Intel P858 and IBM CMOS8S 0.18 um processes for comparison.

  Intel P856.5
(0.25 um)
Alpha EV68
hybrid 0.18/0.25
Intel P858
(0.18 um)
IBM CMOS8S
(0.18 um)

Transistor Ldrawn

0.25 um 0.18 um 0.18 um 0.18 um

Transistor Leffective

0.20 um 0.092 um 0.14 um < 0.13 um

Transistor Tox

41 A 36 A 30 A 36 A

Interconnect

Aluminum Aluminum Aluminum Copper

M1 contacted pitch

0.61 um 0.91 um 0.50 um 0.49 um

M2 contacted pitch

0.88 um 0.91 um 0.64 um 0.63 um

M3 contacted pitch

0.88 um Ref Plane 0.64 um 0.63 um

M4 contacted pitch

1.73 um 2.24 um 1.08 um 0.63 um

M5 contacted pitch

2.43 um 2.24 um 1.60 um 0.63 um

M6 contacted pitch

N/A Ref Plane 1.72 um 1.26 um

M7 contacted pitch

N/A 2.24 um N/A 1.26 um

Table 1. Comparison of Representative 0.25 and 0.18 um Processes

Despite only partial exploitation of its 0.18 um process, the EV68 will run at clock rates exceeding 1 GHz. Future versions of the EV68 that fully exploit the 0.18 um process will achieve about 25% smaller die size, higher clock rates and lower clock normalized power consumption than the initial device. A fully 0.18 um EV68 will likely take advantage of the smaller processor core to add a moderately sized on-chip L2 cache. Beyond EV68 is an ambitious MPU design for high-end servers called the EV7 or 21364. The EV7 is based on the EV6x processor core but adds a large (1.5 Mbyte) on-chip L2 cache, on-chip memory controller/interface, and four sets of bidirectional interprocessor communication links. It is predicted to ship in systems next year. Compaq and IBM recently disclosed that IBM Microelectronics will start manufacturing Alpha processors early next year in IBM's CMOS8S and possibly their CMOS8S2 silicon-on-insulator (SOI) process. At the 1999 Microprocessor Forum Compaq disclosed that the formidable eight issue wide, simultaneous multithreading (SMT) EV8 fourth generation Alpha processor core would be manufactured starting in 2002 in a 0.125 um, copper, low k dielectric SOI compatible process. Those specifications strongly suggest that the manufacturing relationship with IBM will be even more intimate in the future.

PA-RISC: Not Dead Yet!

Throughout its existence, Hewlett Packard's Precision Architecture RISC family has consistently been a top performer. Remarkably, PA-RISC has kept close to the front of the RISC pack (and occasionally taken the lead) over the last five years on the strength of a processor core design which is nearly as old as Intel's P6. This core, the first 64-bit PA-RISC implementation, was first used in the 0.5 um PA-8000, which relied on large external L1 caches to keep pace with Alpha processors that were clocked more than twice as fast as its leisurely 180 MHz. It was given a makeover in the same process and became known as the PA-8200, which ran as fast as 240 MHz. Given new life as the gargantuan PA-8500 in Intel's 0.25 um P856, this core was coupled with 1.5 Mbyte of on-chip L1 cache to act as the strongest challenger to Compaq's sophisticated 0.35 um EV6. The PA-8500 was recently taken into the shop for a tune-up and came out as the PA-8600. Despite being made in essentially the same 0.25 um process as the PA-8500, the PA-8600 relies on critical path cleanups and fine tuning of its cache architecture to provide a 20% performance increase to stay right on the tail of the 0.25 um EV67.

Although HP is publicly committed to switching its high end product line over from PA-RISC to the IA-64 architecture it jointly designed with Intel, it appears to be in no hurry at all to rush the transition. On its public PA-RISC roadmap, HP has identified three additional designs - the PA-8700, the PA-8800, and the PA-8900. The PA-8700 is described as basically a PA-8600 shrunk to a 0.18 um copper SOI process with the on-chip caches increased in size by 50% to 2.25 Mbyte. The fact that the PA-8700 will utilize an SOI copper process immediately ruled out Intel for the role of the manufacturer. But it wouldn't take Sherlock Holmes to notice that the PA-8700 process metal pitches identified in a recent white paper from the HP web site were identical to those of IBM's CMOS8S process shown in Table 1. And indeed it was recently confirmed that IBM Microelectronics would manufacture the PA-8700 starting early next year.

There is little information to suggest that the PA-8800 and PA-8900 are any more than continued minor enhancements and process shrinks of the remarkably long-lived PA-8000 processor core. Even more astounding is that HP classifies the performance levels of the PA-8800 and PA-8900 as similar to the 2nd and 3rd generation IA-64 processors McKinley and Deerfield respectively. No doubt the reluctance of HP to part with its proprietary RISC processor line must cause some degree of consternation in Santa Clara. The fact that the originator of the EPIC processor design concept sees fit to spend precious resources to retain its RISC family, refreshed to at least 2004 or so, is proving an embarrassment to Intel's attempt to position its first few IA-64 processors as the divinely appointed successor to all RISC-based computing.

Itanium: Super Alloy or Toxic Waste?

It seems hard to believe but the most widely endorsed and adopted 64-bit architecture for future systems is an unproven and controversial design whose troubled first implementation is three years late to market. The Intel Merced/Itanium, the first impression of the enormously complex IA-64 instruction set architecture to be set down in silicon, is an example of how technological issues sometimes matter little in the face of powerful vested business interests and alliances.

The basic underlying idea of IA-64, which its creators call EPIC (Explicitly Parallel Instruction Computing), goes back nearly 11 years to a research project started at HP Labs. At the time the first superscalar processors were being designed and a lot of effort was being expended to understand how to design out-of-order execution processors for the next generation to follow. It is quite ironic that the thinking that led to the hideously complex IA-64 architecture originated as a retreat to the keep-it-simple-stupid (KISS) design principles of the early RISC era in reaction to the daunting challenges faced by superscalar pioneers. EPIC proponents were seduced by the siren call of using Very Long Instruction Word (VLIW) like techniques to be able to build very wide issue processors using minimal control logic. There is no free lunch however, and the downside to EPIC is the reliance on the compiler to practically be clairvoyant in its ability to predict the optimal instruction scheduling strategy. No one has yet coded an algorithm to predict the future so the general compiler strategy is actually to generate code that runs as fast as possible for the execution path, predicted at runtime, to be the most likely. The compiler also has to generate code to check for when these assumptions made at compile time fail, and patch up the computational state sufficiently to generate the correct results, albeit more slowly.

The comparison between EPIC designs like IA-64, and dynamically scheduled superscalar processors (CISC or RISC) is in many ways is similar to that between the centrally planned command driven economies of the old Soviet era and laissez-faire capitalism. With the self-assured arrogance of faceless central planners working on the their next five year plan, EPIC designers assumed that their clairvoyant compilers, combined with their wide issue, high clock rate but inflexible processor hardware would be good enough to overcome the more dynamic and adaptive CPUs of its competitors. The hardware of dynamically scheduled processors may not have the time, resources or instruction search width available to an EPIC compiler to search out potential opportunities for instruction level parallelism (ILP). But it has one huge advantage - the ability and opportunity to adapt in real time to unexpected changes in program and data flow during execution arising from external factors (cache or TLB misses, interrupts etc) or unusual program input combinations.

Just as a five year economic plan cannot predict a massive crop failure in year four and be prepared to quickly take corrective measures, an EPIC processor cannot predict which load operation will miss in every level of the cache hierarchy and freeze the entire instruction execution pipeline for hundreds of clock cycles. A free market economy reacts to a crop failure by increasing the price of the commodity affected, which causes new suppliers or substitutes to be attracted by the opportunity. Similarly, a dynamically scheduled processor will react to a cache miss by initiating the necessary memory operation and using the opportunity to execute non-dependent instructions until either these run out or a re-ordering hardware resource, such as rename registers, are exhausted.

To their credit, the creators of EPIC recognized the limitations of compile time prognostication and attempted to cover their assumptions with a variety of ad hoc architectural features that the compiler could employ to obtain some of the benefits of dynamically scheduled code execution under specific and limited circumstances. For example, rotating registers provide some of the benefit of true register renaming in avoiding the debilitating effects of false register dependencies in the code body of loops. Speculative loads provide some limited ability to overlap a potentially long latency memory access with other instruction execution by allowing the compiler to advance the load beyond control dependencies. EPIC designers also recognized that they would heavily rely on run-time profiling data driven compiler optimization and built in the ability for the compiler to flip 'bias' flags in individual instructions as a hint to the otherwise inflexible EPIC hardware as to the optimum execution strategy to follow. IA-64 compilers can control how individual conditional branch instructions will be handled by hardware - whether dynamic branch prediction resources should be expended trying to predict that branch or if the hardware should just statically assume the branch is always taken, or assume it is never taken.

The performance vs. cost trade-off of EPIC processors, relative to dynamically scheduled superscalar RISC processors, are still not known with certainty and probably varies from application to application (i.e. embedded controller vs. technical workstation etc), and over time with the inexorable advancement of semiconductor technology. What is quite obvious is that one of the chief benefits of the EPIC design philosophy, hardware simplicity, is largely going to elude IA-64 implementations. The IA-64 ISA is a product of a joint design committee consisting of technical staff from both Intel and Hewlett Packard. And it shows.

One defining decision of this committee was to include the entire x86 instructions set within the architecture in the form of a hardware-based compatibility mode. The large disparity in complexity between IA-64 and existing 64-bit instruction set architectures is revealed by the implementation technology and characteristics of the first implementation of these architectures shown in Table 2

Architecture First Implementation Technology Area
(mm2)
Transistors
(millions)

IA-64

Merced/Itanium

0.18 um >300 (est) 25.4

PA-RISC 2.0

PA-8000

0.50 um 347 3.9

SPARC V9

UltraSPARC

0.50 um 322 5.2

Alpha AXP

21064

0.75 um 234 1.68

MIPS III

R4000SC

0.80 um 184 1.35

Table 2 Characteristics of Initial Implementation of various 64 bit ISAs

SPARC: Performance isn't Everything

Like PA-RISC, Sun Microsystem's SPARC design was one of the earliest commercially available RISC processors. And like PA-RISC, SPARC didn't turn 64 (bit) until middle age. The first 64 bit SPARC was the 0.5 um UltraSPARC. The UltraSPARC core design was subsequently ported to a 0.29 um process and renamed the UltraSPARC-II (US-II). Currently Sun is putting on a brave face selling heavily discounted US-II systems running as fast as 450 MHz while it awaits its new UltraSPARC-III core. And it couldn't come too soon. The 450 MHz US-II is in the embarrassing position of being both outclassed by over 2:1 on SPECint95 by high-end x86 processors and overtaken on SPECfp95 too.

However, SPARC hasn't been close to the front of the RISC performance race in more than ten years and this fact hasn't seriously hindered the commercial success of Sun in either the technical workstation or server market. Sun developed an early software lead over other RISC processor families on the strength of the innovative and fast SPARCstation 1 and 2 systems in the late 1980's. On the strength of its gigantic software base, large scale multiprocessor servers, and a consistent product strategy revolving around Unix on SPARC, Sun Microsystems has retained a surprisingly loyal following in the IS departments of Fortune 500 companies despite SPARC's glaring performance shortcomings.

MIPS: From Intel Challenger to Super Mario

Like PA-RISC and SPARC, the highly respected MIPS architecture can trace its roots back to the earliest days of the RISC revolution. In fact, MIPS was so highly regarded that Compaq, Microsoft, DEC, and MIPS Technologies Inc. formed the Advanced Computing Environment (ACE) consortium in 1991, an initiative that once posed a credible threat to overthrow Intel's x86 hegemony on the PC desktop and replace it with MIPS processors. After some financial difficulties following the collapse of the ACE consortium in 1992 (from a combination of internal strife and Intel's heavy handed tactics in Asia to keep motherboard makers from supporting ACE) MIPS Technologies was acquired by Silicon Graphics Inc. (SGI). Around this time MIPS was able to deliver the first commercial true 64-bit microprocessor, the R4000. The R4000 was followed up by the slightly improved R4400. In 1995 the advanced, out-of-order execution, 0.35 um MIPS R10000 (R10K) core shipped at clock rates up to 200 MHz.

During the 1990's low end 32 and 64 bit implementations of the MIPS ISA became very popular for embedded control applications. The success of the 64 bit R4300 processor in the Nintendo 64 video game console helped MIPS to recently achieve the milestone of first RISC microprocessor architecture to sell over 100 million units. Unfortunately this success wasn't mirrored on the high-end. In 1997 SGI canceled plans for new MIPS processor cores code named H1 and H2 and announced it would adopt IA-64. Subsequent delays to Merced/Itanium have left SGI vulnerable to its competitors because, unlike HP for example, SGI didn't keep its R10K and R12K processors competitive in clock frequency. The recent disclosure of respectable SPEC2000 benchmark scores for the 400 MHz R12K underlines the perpetual heroic efforts at system and compiler level tweaking at SGI to make up for the shortcomings of their semiconductor manufacturing partners. Ironically the MIPS Technology subsidiary of SGI is being gradually spun out of SGI to further exploit its successes in the embedded control market even as its parent continues to suffer.

SGI has announced it would continue work on its own continued tweaks of the basic R10K core. The R14K is targeting a 400 to 450 MHz clock rate with an improved system interface. The R16K should follow next year with L1 caches doubled in size (to 64 Kbyte each) will run in the 600-800 MHz range. A decision for a further shrink of the R10K core into a R18K device will be made at a future date. Presumably the R18K's fate will depend on the degree of market acceptance of IA-64 based products. But none of these new devices are likely to do much to slow the MIPS ISA's full retreat from general purpose computing.

IBM's Power Trip

For a technology powerhouse that pioneered the technology of the modern RISC era a decade before anyone else, IBM's record of commercial success in RISC processors has been a mixed bag. In 1974 John Cocke and fellow researchers at IBM's Thomas J. Watson Research Center began to research the design of a central control processor for an all digital telephone exchange. This research eventually evolved into the what was known as the 801 project, which laid down the foundations for the processor design and compiler technology for the RISC revolution a decade later.

In the seventies IBM was a colossal force in the computer industry and a new method of designing CPUs that could make $60,000 worth of ECL chips and circuit boards equal or surpass the performance of multi-million dollar mainframe CPUs was not exactly welcome. IBM essentially sat on what it learned from the 801 project for more than a decade until the RISC message was spread far and wide through the efforts of bay area university professors David Patterson and John Hennessy. When IBM did decide to enter the RISC market it was with a product, the PC/RT, which was so pathetically underpowered that some industry observers thought IBM was trying to sabotage the RISC movement.

IBM's second effort, the RS/6000, was far more serious, a massive, multi-chip implementation (RIOS) of a RISC architecture called POWER. Although the RS/6000 never challenged other RISC or even CISC processors for the clock frequency crown, its ample low latency execution resources, custom cache chips, and huge memory bandwidth allowed it to turn very respectable performance levels, particularly on floating point applications. The 1.0 um CMOS Power CPU chipset was eventually shrunk to 0.70 um. A subsequent shrink and tweak became known as the Power 2 while single chip versions of POWER, called RSC (RIOS Single Chip) and RSC2, were also created. More recently IBM has delivered full-blown single chip implementation of POWER called Power3 (0.25 um CMOS) and Power3-II (0.22 um CMOS, copper).

In late 1991 Apple Computer Inc. was searching for a RISC processor to replace the 68K CISC processor line in its Macintosh desktop computers. Early experimental Macintoshes were based on the Motorola 88K RISC processor line. For some reason Apple decided the 88K would never cut it and convinced Motorola and IBM to jointly create a new RISC architecture called PowerPC that was basically POWER rationalized for desktop computing. The initial product, the PowerPC 601 was essentially an RSC2 modified to support both POWER and PowerPC ISAs with an 88K bus interface grafted on (Apple wanted to protect its early investment in chipsets). A joint CPU design center called Somerset was set up by IBM and Motorola, and a three product roadmap was defined - the 603, the 604, and the 620. Only the PowerPC 620 was a full 64-bit version of the PowerPC architecture. But the 620 was years late coming to market and in the end only shipped in tiny quantities to presumably fulfill certain contractual obligations. The first widely shipping 64-bit version of the PowerPC was the RS64; a design originally developed to provide a platform for the evolution of IBM AS/400 line of minicomputers. The RS64-II, or Northstar, is a continuation of this line into 0.25 um CMOS technology). These processors turn in mediocre performances for technical and scientific computing applications, but were designed mainly for commercial applications. There, they are competitive with x86 Xeon and other RISC on the strength of their short, low latency pipeline and wide, high bandwidth system interface.

All in all, IBM's record in high-end microprocessors has been mediocre, given the enormous technical breadth and width of this once formidable corporation. This is especially surprising considering its strength in inventing and exploiting breakthroughs in new semiconductor technologies that even chip giant Intel cannot begin to match. IBM seems to have realized its underachiever status and has mobilized forces from all corners of its kingdom for a tremendous technological push directed at preventing total IA-64 domination in the 64-bit processor market. The spearhead of this offensive is the packaging wonder known as POWER4. This is a massive processor manufactured in IBM's CMOS8S2 0.18 um copper interconnect SOI process. This device incorporates two 5-issue wide superscalar RISC cores that implement the 64 bit version of the PowerPC ISA. These two processors, operating at clock rates in excess of 1 GHz, are teamed with a large, shared on-chip L2 cache, a controller for an external L3 cache, and three sets of high bandwidth interprocessor communication links. This arrangement allows four POWER4 dice to be incorporated into a large multi-chip module (MCM), with each device fully connected to its three neighbors. The MCM is mounted within a thermal conduction module (TCM) of the type IBM perfected long ago for its mainframe CPUs. A single POWER4 TCM, which is about 4.5 inches on a side, incorporates all the elements of a fully connected 8-way multiprocessor system, with tremendous amount of potential interprocessor, memory, and I/O bandwidth.

The 64 bit Contestants

In all likelihood the Alpha EV68, the Merced/Itanium, and UltraSPARC-III will begin shipping in production systems within the next three to six months. These devices will enter a 64-bit processor market that is currently performance dominated by the Alpha EV67 and HP PA-8600 processors. Besides the obvious competition between huge multinational corporations, the upcoming 64-bit MPU battle will also represent a clash between competing ideas, approaches and philosophies of how to build the best high end microprocessor.

The Alpha EV6 and EV7 and PA-RISC 8x00 are complex, dynamically scheduled processors with relatively short execution pipelines, with 7 stages each for simple instructions. The other end of the spectrum are the Merced/Itanium and UltraSPARC-III, which are in-order designs with relatively long execution pipelines of 10 and 14 stages respectively. The pipeline organizations of these four processor cores are shown in Figure 2.


Figure 2 Execution Pipelines of the Major 64 bit MPUs

Remarkably, the added complexity of the logic needed to implement out-of-order execution and the short execution pipeline of the Alpha and PA-RISC processors doesn't seem to have had a noticeably negative effect on maximum clock rate, compared to the long pipelines of the theoretically simpler in-order Merced/Itanium and UltraSPARC-III processors. The design, manufacturing process, and estimated operational characteristics of these processors, and some of their successors are shown in Table 3. I didn't include the Power4 because virtually nothing has been disclosed about the processor core.

  Alpha
EV68
Alpha
EV7
PA-RISC
PA-8600
PA-RISC
PA-8700
IA-64
Merced
IA-64
McKinley
SPARC
US-III
Process (um) 0.25/0.18 0.18 0.25 0.18 0.18 0.18 0.21
Substr, metal bulk, Al bulk, Al bulk, Al SOI, Cu bulk, Al Bulk, Al bulk, Al
Leff (um) 0.092 0.092 0.20 0.14 0.14 0.14 0.15
Area (mm2) 193 350 469 260 >300 >300 244
Transistors 15.2m 100m 130m 193m 25.4m 150m 23m
I-cache 64K, 2w 64K, 2w 512K, 4w 768K, 4w 32K, 4w ? 32K, 2w
D-cache 64K, 2w 64K, 2w 1024, 4w 1.5M, 4w 32K, 4w ? 64K, 4w
L2 cache - 1.5M, 8w - - 96K, 6w ? -
Clock (MHz) 1100 1500 600 1000 800 1200 800
Power (W) 70 110 90 70 150 150 60
System Band-width (GB/s) 3.2 12.8 mem
28.8 ipc/io
1.54 2.5 2.1 6.4 2.4
Fetch/cycle 4 4 4 4 6 6 4
Issue/cycle 6 6 4 4 6 9 6
Retire/cycle 11 11 4 4 6 9 6
SPECint95 65 90 45 75 50 90 45
SPECfp95 110 160 65 110 80 150 70
SPECint2K 750 1050 400 640 450 950 425
SPECfp2K 950 1400 360 750 665 1250 600
Introduction 2H00 1H01 2Q00 2H01 2H00 2H01 2H00

Table 3. Characteristics of Near Future 64 bit High End MPUs
(estimated values in italics)

The battle for mind share, as well as market share, in the 64-bit high-end market will be fought on many battlefields, not just performance or technical excellence. High performance is no more a guarantee of commercial success any more than poor performance leads to commercial failure, as SPARC and Alpha have aptly demonstrated over the last five years.

I have attempted to score the major 64-bit high-end processor families on some of these important if less tangible or quantifiable factors looking forward over the next several years, and the results are in Table 4. Uniprocessor performance is relatively straightforward - Alpha is king and it is up to IA-64 and Power to prove they can take the crown away. HP has traditionally kept PA-RISC performance close to that of Alpha, but the lack of a new core will start to hurt in the next few years. System bandwidth and multiprocessor scalability is a function both of the features built into the MPU, and the chipset and system architectures supported. The Alpha EV7 and Power4 will have awesome bandwidth and scalability, while McKinley's shared bus architecture is a throwback to Intel's legacy in low-end servers. Sun has long relied on multiprocessing to make up for SPARC's uniprocessor performance deficiencies.

  Alpha IA-64 PA-RISC Power SPARC
Uniprocessor Performance A B+ B B+ C
System Bandwidth A B C A C
MP scalability A C B B A
System Software
Maturity
A C A A A
Application Support C+ B B B- A
3rd Party Support B- A C C B-
Marketing Effort C- A B C A
Technical Longevity A A B A B
Economic Longevity B A B B+ A
Perceived Longevity C A C B B+

Table 4. Relative Strengths of 64 bit High End MPU Families Looking Forward

Probably IA-64's weakest aspect is the relative youth and immaturity of its system software, such as compilers and operating systems, compared to its established RISC competitors. Although Intel is strongly supporting independent software vendor (ISV) efforts to port applications to IA-64, it will lag the popular SPARC architecture for a long time. Third party support means that the MPU is available to third party OEM's to build systems around. Alpha and SPARC MPUs and boards are available from Alpha Processor Inc. and Sun Microelectronics respectively. Of course, Intel will be pursuing primarily a merchant chip business model with IA-64. Marketing effort is an indication of how effective the sales organizations of companies that back each architecture are at getting their message across to potential buyers through the trade press and general media.

One of the important considerations computer makers assess when committing to a 64-bit MPU architecture is the longevity of a processor line. Technical longevity is an indication of how 'future-proof' the architecture is and how credible the roadmap of future implementations seems in terms of staying competitive in performance and cost with other 64-bit high-end MPU families. Economic longevity considers the ability of the backers of each processor family to continue to invest in it to keep it competitive based on its business model and sales levels. What I call perceived longevity touches on the less reputable side of the computer business - the sport of spreading fear, uncertainty, and doubt (FUD) about competing products and vendors. Alpha has been particularly hard hit by a combination of effective fear mongering by competitors and the troubled history of DEC's final years before its acquisition by Compaq. Conversely Intel and HP have been effective in convincing many overly credulous individuals in the computer technical press that IA-64 represents something entirely new (EPIC) that will eventually supersede RISC.