The History of Alpha Processors: Facts and Comments

This work starts a series of articles, dedicated to Alpha processors and the architecture, as well as to other related questions. We will look back at the times when Alpha processors were considered real kings from the performance prospective, and their future was expected to be truly bright. We will try to explain what happened, and why one of the most interesting and promising computer architectures has been thrown into oblivion.

by Pavel Bolotov
07/10/2005 | 09:18 PM

Dig my grave both long and narrow
Make my coffin neat and strong 
(from an old American song)

 

This work starts a series of articles, dedicated to Alpha processors and the architecture, as well as to other related questions. We decided to make it a series of articles, because presenting the whole material available in a single overview would be somewhat problematic and generally inadequate, from the author's point of view.

Besides, the topic we are about to discuss is pretty overwhelming, fundamental in many aspects, and there are not so many other materials available right now that would discuss in detail the entire architecture and not just separate products distanced largely in time.

Maybe this article could look better if written and published several years ago, when Alpha processors were considered real kings from the performance prospective, and their future was expected to be truly bright. But, only nowadays it seems to be the right time to draw the final line, to explain what happened, and why one of the most interesting and promising computer architectures has been thrown into oblivion.

Generally, this article is a historical overview with some elements of analysis, so it should be considered as such. We do not insist that it should be taken as universal, even though it contains a lot of reference information. On the other hand, it isn't an obituary notice or a funeral prayer, definitely...

PDP and VAX

Digital Equipment Corporation (abbreviated to DEC), was founded in 1957 by two engineers, Kenneth Olsen and Harlan Anderson, graduates of Massachusetts Institute of Technology, and was one of the oldest and most known companies of the world computer industry.

Before founding, Olsen worked for Lincoln Laboratory at the institute mentioned above, which was supported by the Department of Defense (USA), and participated in development of one of world's first transistor-based computers, TX-2. The company was producing and selling backplane modules for computers initially, but in 1960 it offered the first computer of its own, 18-bit PDP-1 (Programmable Data Processor - 1), able of completing about 100 thousand operations per second. By the way, that machine was used to run the first computer game in history, Spacewar of Steven Russell. 12-bit PDP-8, which was introduced in 1964, deserved to be called the first "minicomputer" (with the size of a small wardrobe) manufactured in mass quantities. Also, the price was attractive: about 18,000 USD (in 1965) for a standard configuration. Because of an excellent price/performance ratio, PDP-8 was able to compete with those famous IBM mainframe systems quite successfully. There were about 1450 machines produced by 1968 (not counting numerous modifications following). 36-bit PDP-10 was ready in the same 1968, based upon the design of an experimental PDP-6, and targeted for data processing centers, research laboratories, and military needs. Different versions of PDP-10 were manufactured until 1983. They kept working on the 36-bit architecture improvement within the Unicorn project under supervision of Leonard Hughes and David Rogers, but the project was closed in June 1975, and all its resources were transferred to support another, 32-bit, architecture.

16-bit PDP-11, was launched in production in the beginning of 1970's. It was the first DEC's computer to use 8-bit bytes, and a direct successor to the PDP-8 product line. Due to a simple and fortunate Unibus-based architecture (or a modified one, based upon Q-bus), a considerably effective instruction set, and low manufacturing costs, which was also very important, the PDP-11 product line turned a success. Of course, they started cloning PDP-11 all over the world very soon, including even those "countries of people’s democracy". They released CM-4 (USSR, Bulgaria, Hungary), CM-1420 (USSR, Bulgaria, German Democratic Republic), CM-1600 (USSR), IZOT-1016 (Bulgaria), DVK (USSR). There were many operating systems developed for PDP-11: DEC offered P/OS, RSX-11, RT-11, RSTS/E, also several derivatives of DOS, and finally, the first release of UNIX OS was completed in Bell Laboratories on PDP-7 and PDP-11 machines in 1971, in assembler. PDP-11 left the market during 1980's because of one, but inevitable reason: lack of address space. This is when a new, 32-bit though still CISC, architecture was promoted to the market.

And that architecture was VAX (Virtual Address eXtension). It was approved officially during a VAX Architecture Committee session in April 1975. They have been working on this architecture for several months within the Star project supervised by Gordon Bell, in parallel with the Unicorn project mentioned above. Once both projects had been completed, they decided to cancel any further development of 36-bit systems, and to concentrate resources available to support 32-bit VAXen (plural of VAX). In fact, the Star project was to prove the necessity of increasing general registers' width of PDP-11 to 32 bits, their number from 8 to 16, and a significant redesign of the instruction set. The first VAX machine was announced in October of 1977, model 11/780. A few months later, in February of 1978, they released a new operating system for VAXen, VMS (Virtual Memory System) v1.0. It was a multi-user and multi-tasking OS supporting up to 64MB of main memory, had networking functions implemented (DECnet), also an adaptive task scheduler, an extended process management, and many other innovations never seen before. Renamed to VAX/VMS, v2.0 was presented in April of 1980, carrying numerous improvements. Also, the classical UNIX was ported to VAX pretty soon. VAXen were manufactured and sold really successfully during 1980s, and were shipped in limited quantities according to special contracts even close to the end of the century. The whole product line included several dozens of models, ranging from compact workstations to 6-processor mainframe-class servers. Even nowadays, thousands of VAXen keep working in the Department of Defense and the NSA (National Security Agency) of the United States, as well as in numerous commercial organizations. Nevertheless, the epoch of VAXen was 1980s, and in 1990s DEC bet on a new architecture. 

 

The PRISM Project

In the beginning of 1980s DEC was on the paramount of its financial wealth, mostly because of high revenues related to constantly growing sales of VAX machines. But nothing lasts forever, and it was obvious that some day VAX would have to leave the market in favor of a new architecture, like it was happening with PDP-11. Those days many companies started to pay more and more attention to RISC concepts and implementations, and DEC had no intention to ignore that trend. There were several subdivisions inside of DEC between 1982 and 1985, which researched actively over the RISC area:

In 1985, after Cutler's initiative on creating a "corporate RISC plan", all 4 projects were merged into one aka PRISM (Parallel Instruction Set Machine), and the first draft for a new RISC processor was released in August 1985. Here I would like to mention that DEC had participated in the development of MIPS R3000 processor those days, and even initiated the creation of Advanced Computing Environment consortium to promote that architecture into the market.

Therefore, no wonder that the developed processor inherited many features of MIPS architecture, but at the same time the differences were obvious. All instructions were of fixed-length of 32 bits, with the upper 6 and the lower 5 presenting an instruction code and the remaining 21 were reserved for immediate data or addressing needs. There were 64 primary 32-bit general purpose registers (MIPS implied 32 registers), also 16 additional 64-bit vector registers, 3 control registers for vector operations: two 7-bit (vector length and vector count), and one 64-bit (vector mask). There was no processor status register, that is why the result of two scalar operands comparison was placed into a general purpose register, while the result of two vector operands comparison - into the vector mask. There was no built-in floating-point unit. A set of special instructions (Epicode, or extended processor instruction code) was created in software, using loadable microcode, to facilitate handling of special tasks required by the environment or operating system, and not supported by the standard instruction set. Later on, this function was implemented for Alpha architecture under the name of PALcode (Privileged Architecture Library code).

In 1988, when the project was still in progress, DEC’s high management decided to close it, considering any further financial support a waste of money. Cutler protested against that decision, resigned and went to Microsoft, to supervise a department developing Windows NT.

In the beginning of 1989 DEC presented its first RISC-powered workstations. They were DECstation 3100 with 32-bit MIPS R2000 inside clocked at 16MHz, and DECstation 2100 using the same processor type but clocked at 12MHz. Both machines were running Ultrix OS, and were priced rather inexpensively (about 8,000 USD (in 1990) for DECstation 2100).

The Alpha Project

In 1989, aging VAX architecture was hardly able to compete with 2nd generation RISC architectures, such as MIPS and SPARC, and it was quite obvious that the next generation of RISC hardware would leave not so many chances for VAX to survive. In the middle of 1989 DEC engineers received a task to create a competitive RISC architecture with long-term potential, but at the same time carrying a minimal set of incompatibilities with VAX, because VAX/VMS and all accompanying applications had to be ported to the new architecture. It was also defined to be 64-bit right from the start, since competitors were about to release their 64-bit solutions at about the same time. A development group was created, where Richard Witek and Richard Sites were the chief architects.

Alpha architecture was mentioned officially for the first time on February 25, 1992, during a conference in Tokyo. Also, most key features of the new architecture were listed within a concise overview (for comp.arch, a USENET conference). They also mentioned that "Alpha" was an internal codename, and an official name would be provided later. The new processor was of a clean 64-bit RISC design to execute fixed-length instructions (32 bits each), with 32 integer 64-bit registers, operated 43-bit virtual addresses (with a possibility to expand up to 64 bits in future implementations), and used, like VAX, little-endian byte order (i.e. when a low byte of a register occupies low memory address line unlike big-endian byte order, introduced by Motorola and used in most processor architectures, where a low byte of a register occupies high memory address line). A math1ematical coprocessor was built into the core, with 32 floating-point 64-bit registers using random access order, unlike primitive stack access order, implemented in Intel x87 coprocessors. The total lifetime of the new architecture was estimated as 25 years at least.

The instruction set was simplified to facilitate pipelining as much as possible, and consisted of 5 groups:

It should be mentioned that there were no integer division instructions, because they were most complex and thus badly pipelineable, so they were emulated.

Alpha architecture was a "real" RISC (unlike modern processors with i386 architecture, which are RISC only inside). The conceptual difference between RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing) was (and still is) as follows:

Feature

CISC

RISC

Instruction
length

Variable,
depends on
the instruction type

Fixed,
doesn't depend on
the instruction type

Instruction
set

Wide,
adapted for programmer's
needs

Balanced,
adapted for processor execution
convenience

Memory
access

Allowed for different kinds
of instructions

Allowed for load/store
instructions only

The processor was supposed to be launched in production at a very high frequency - 150MHz, which should be increased up to 200MHz with the same production technology. This appeared possible due to successful architecture, as well as to engineers' decision to give up automatic design systems and to perform all the work manually.

The project entered the manufacturing stage, and was reorganized as a regular division of DEC very soon.

DEC's marketing department came up with a name for the new architecture, which was called AXP (or Alpha AXP), though no one knows for sure what exactly this abbreviation meant. Quite possible, nothing at all; in the past, DEC had legal problems with its VAX brand, because there was another company, a vacuum cleaners manufacturer, claiming this name, so the conflict was taken to court. By the way, thay also insisted that DEC's equipment sales suffered because of the other company's slogan, "Nothing sucks like a Vax!" After all, a joke appeared saying that AXP meant "Almost Exactly PRISM"

EV4, LCA4, EV45, LCA45

The first processor of Alpha family was called 21064 (21 pointed that Alpha was an architecture of the XXI century, 0 - the processor generation, 64 - computational capability in bits). It was also code-named EV4 (EV was [supposedly] the abbreviation of "Extended VAX", and 4 - technical process generation, CMOS4, where CMOS stood for "Complementary Metal Oxide Semiconductor"). I have to point out that EV4 prototype was ready in 1991, using CMOS3 process that is why it featured smaller caches and no floating-point unit. Nevertheless, it was an important threshold for tuning and polishing off the architecture and software environment. EV4 was introduced in November 1992, and was manufactured using advanced for those days 3-layer 0.75µ technical process (in the future, it was modified towards 0.675µ CMOS4S, the optical modification of CMOS4). It was designed for 3.3V voltage and core frequencies ranging from 150MHz to 200MHz (TDP from 21W to 27W). The solution consisted of 1.68 mln transistors, and featured a 233mm? die. EV4 supported multi-processing, which was one of the key features of that architecture. It was designed in PGA-431 (Pin Grid Array) form-factor.

The L1 cache was integrated: 8KB for instructions (I-cache, instruction cache), direct-mapped, and 8KB for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 clocks. Every line of I-cache consisted of 32 bytes of instructions, a 21-bit tag record, an 8-bit branch history field, and of several auxiliary fields. Every line of D-cache consisted of 32 bytes of data and a 21-bit tag record. The L2 cache (B-cache, backup cache) was a recommended option, using external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead, sized up to 16MB (usually from 512KB to 2MB). Every line consisted of 32 bytes of data or instructions with a 1-bit longword parity or 7-bit longword ECC field, a 17-bit maximum tag record with an additional 1-bit longword parity protection, and a 3-bit status flag with an additional parity bit. Read and write speed of B-cache was programmable, and measured in processor clocks. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit longword parity or 7-bit longword ECC field), and was multiplexed with B-cache data bus, switching if necessary. The system address bus was 34-bit wide. B-cache was organized to be inclusive to D-cache, i.e. contained a full copy of the latter. A mechanism called “victim write” was used to store data from B-cache to the memory. Only processor could perform read/write operations with B-cache, the system logic could only read B-tag data (that was of the top importance for multi-processor systems especially, to maintain cache coherence of all processors available within a machine).

The processor was powered with one integer pipeline (E-box, 7 stages), and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per clock to the execution units, namely E-box, F-box, and load/store unit (A-box). The cache and system bus controller (C-box) worked in cooperation with A-box, and supervised both: integrated I-cache and D-cache, as well as external B-cache. The branch prediction unit maintained a 4096-entry branch prediction table, 2 bits per entry. There were I-TLB of 8 entries for 8KB pages and 4 entries for 4MB pages, and D-TLB of 32 entries; all fully-associative.

 

Despite its excellent performance, EV4 was quite expensive for most potential customers, and thus its low-priced brother was released in September 1993. It was 21066 (LCA4, or LCA4S). It was based on EV4 core, but with additional integrated memory and PCI controllers, as well as several secondary functions. Although I have to stress that the system data bus width was reduced to 64 bits, which affected the performance in a negative way. LCA4 was manufactured using 0.675µ CMOS4S process, with the die size even smaller than that of EV4 (209mm? compared to 234mm?). It also worked at lower clock frequencies: 100MHz - 166MHz, presumably to avoid potential overheating issues for poorly ventilated desktop cases of those days. Besides they also tried to avoid creating an additional competitor to EV4. The newcomer contained 1.75 mln transistors, and required 3.3V voltage. The design of this processor was licensed to Mitsubishi, and they also manufactured LCA4 (including a 200MHz version).

21064A (EV45) was announced at Microprocessor Forum in October 1993. It was a modified EV4, produced using 4-layer 0.5µ CMOS5 process. 21066A (LCA45) was presented at COMDEX'94 in November 1994. It was LCA4 modified almost the same way as EV4 towards EV45. Note that DEC's marketing people had developed a habit to add a letter to the processor model name after a redesign towards a more advanced technical process. In fact, the cores didn’t undergo any dramatic changes: I-cache and D-cache of EV45 got twice as big (16KB I-cache + 16KB D-cache), their data and tag fields gained a parity bit each, branch history fields of the I-cache were expanded to 16 bits, D-cache had become 2-way set associative, and 1-bit byte parity mode was added to those existing integrity modes of the system data bus. Also, both EV45 and LCA45 were awarded with a modified F-box. Namely the modifications touched upon the division optimization that implied that EV4 could execute commands with single-precision operands in 34 clocks and with double-precision operands in 63 clocks, with no dependence upon the operands values. As for EV45, it could do the same in 19-34 clocks for single-precision operands and in 29-63 clocks for double-precision operands, depending on the operands' values. LCA45 was also manufactured by Mitsubishi. Both die sizes were decreased to 164mm? for EV45 and 161mm? for LCA45. The transistors count increased to 2.85 mln for EV45, and remained the same for LCA45 (1.75 mln). Finally, power consumption per clock decreased for both processors, though voltage remained unchanged (3.3V). The core frequencies of EV45 were ranging from 200MHz to 300MHz (TDP from 24W to 36W), those of LCA45 - from 166MHz to 233MHz.

Since DEC developed equipment for the United States Department of Defense, 21068 66MHz and 21068A 100MHz were introduced in 1994. They were developed basing on LCA4 and LCA45 respectively, and were advanced for military needs (passive cooling, extreme temperature conditions, etc.).

First chipsets for EV4 supported TURBOchannel, FutureBus+, and XMI peripheral buses. Though all of them were high-speed designs for those days (about 100Mb/s per bus), they didn't get very widely spread, so only a very limited set of peripherals was available for them. Therefore, DEC paid special attention to industry-standard bus architectures, such as PCI and ISA (EISA). A new chipset was introduced in 1994, DEC Apecs, and it was available in two modifications: for 64-bit system data bus (21071), and for 128-bit system data bus (21072). They differed by the number of micro-chips: 21071 consisted of 4 chips (1 universal controller, 2 data slices, 1 PCI bus controller), while 21072 consisted of 6 (there were 2 additional data slices). They supported 33MHz system bus frequency, up to 16MB of B-cache, up to 4GB of FPM parity memory with access time from 100 to 50ns. Support for ISA or EISA buses could be implemented in the standard bridges, such as i82378IB (ISA) or i82378EB (EISA).

The first Alpha workstation was available in November 1992. It was DEC 3000 Model 500 AXP (codenamed Flamingo), with EV4 150MHz, 512KB of B-cache, 32MB of main memory, 1GB SCSI HDD, SCSI CD-ROM, built-in 10Mbit Ethernet controller (thick coaxial and twisted pair), built-in sound and ISDN controllers, also a 19" monitor (1280x1024 8-bit). The price was shocking: $38,995.

EV5, EV56, PCA56, PCA57

DEC unveiled the very first information about its 2nd generation Alpha processor on Hot Chips conference in Palo Alto (California), which started on August 14, 1994, although the official release of 21164 (EV5) dated back to September 7, 1994, when the respective DEC’s press-release was published. The processor was based on the EV45core, and was rather an evolution of the latter than a revolutionary new design. The number of pipelines was doubled, both integer and floating-point, compared to EV4 or EV45. Also, the floating-point pipelines were transformed to work in 9 stages instead of 10. But, the integer pipelines weren't all the same: while both were capable of elementary arithmetical and logical operations, the 1st could only multiply and shift, and the 2nd was only able to process conditional/unconditional jumps. Also, both pipelines could calculate virtual addresses for load instructions, but only the 1st could do the same for store instructions. The floating-point pipelines were different as well: the 1st could execute any floating-point code except for multiply instructions, which were the only code the 2nd pipeline could process. I-box was able to fetch and decode up to 4 instructions per clock, to provide the execution units with the necessary load. This processor was manufactured with the same 4-layer 0.5µ CMOS5 process as EV45, required 3.3V voltage, contained 9.3 mln transistors (including 7.8 mln for integrated cache areas), features a 299mm? die, which was very close to theoretical limits of the technical process involved. Core frequencies ranged from 266MHz to 333MHz (TDP from 46W to 56W). The processor was manufactured in IPGA-499 (Interstitial Pin Grid Array) form-factor.

I-cache and D-cache were sized and organized just like those in EV4, i.e. 8KB each. D-cache remained write-through, but it was made dual-ported, i.e. was able to deliver data for 2 load instructions per clock. Sacrificing transistors for the sake of performance, D-cache was composed physically of 2 absolutely identical parts 8KB each, so data could be read from either of them, but had to be written into both. The processor had 96KB of the integrated L2 cache (S-cache, secondary cache), write-back, 3-way set associative, so C-box knew to use it via a dedicated 128-bit data bus. At the same time, B-cache was also functional (though remained optional, consisted of external cache SRAMs, and could be as large as 64MB, though usually ranged from 1MB to 4MB). In other words, EV5 supported 3 cache levels! S-cache could be accessed via a 4-stage pipeline: two clocks for tag search and modification, and two clocks for data access and delivery. Every S-cache line was 64 bytes wide (though it could also be addressed as two sub-lines, each 32 bytes wide), and had one tag per line. D-cache read latencies were reduced to 2 clocks, and S-cache could deliver data in 7 clocks (as I have mentioned above, 4 clocks for the first 16 bytes, and 1 clock for every next 16 bytes to fill the entire line). Like in EV4, the contents of D-cache was duplicated, although this time in S-cache. Besides, B-cache was made including S-cache, regardless the associativity differences. I-TLB held 48 entries (for pages sized from 8KB to 4MB), D-TLB - 64 entries, and it became dual-ported like D-cache. The system data bus featured fixed-length of 128 bits (with additional 16 bits for ECC protection), and still multiplexed with the data path to B-cache. The system address bus was 40 bits wide, the control bus was a 10-bit one.

 

21164A (EV56) was introduced at Microprocessor Forum, in October 1995. It was a modified version of EV5, after a technology shrink to 0.35µ CMOS6, manufactured at the same factory in Hudson (DEC had invested about 450 mln USD in modernization of this facility). The most important architectural difference was BWX (Byte-Word Extension) - a set of 6 additional commands to load/store data in 8- or 16-bit quanta. At first, Alpha architecture was forced to load/store data in 32- or 64-bit quanta, which caused certain difficulties for porting or emulating code from other processor architectures, such as i386 and MIPS. A request to implement BWX in hardware was submitted in June 1994 by Richard Sites, and was approved in June 1995. Note that BWX required both, the processor and the chipset, to support it. EV56 was designed to work at core frequencies ranging from 366MHz to 666MHz (TDP from 31W to 55W), starting from the summer of 1996. Samsung also produced EV56 according to a license agreement signed in June 1996 (a 666MHz version was shipped by Samsung only). It contained 9.66 mln transistors, featured 209mm? die, and required dual voltage (2.5V for primary and 3.3V for input-output circuits).

 

21164PC (PCA56) was introduced on March 17, 1997. It was a low-cost version of EV56, designed jointly by DEC and Mitsubishi. S-cache and the accompanying logic were absent, but I-cache grew twice as big (16KB). The processor consisted of 3.5 mln transistors, and its die was 141mm? big. It was manufactured with the same technical process and required the same voltage as EV56. The form-factor did change, however: the newcomer was designed as IPGA-413 instead of IPGA-499. Its core frequencies ranged from 400MHz to 533MHz (TDP from 26W to 35W). Later on, Samsung also manufactured a 0.28µ 21164PC (PCA57), with twice as big I-cache and D-cache and with 2-way set associativity of the D-cache. The transistors count increased to 5.7 mln, but the die size decreased to 101mm?. It required lower voltages: 2.0V for primary and 2.5V for input-output logic. The supported core frequencies lay between 533 and 666MHz (TDP from 18W to 23W).

Besides the BWX instructions inherited from EV56, PCA56 supported a new set, MVI (Motion Video Instructions), intended to accelerate video and audio calculations using SIMD (Single Instruction - Multiple Data) approach, somewhat comparable to MMX.

The first standard chipset, developed for EV5, was DEC Alcor (21171). It supported 33MHz system bus, up to 64MB of B-cache, up to 8GB of FPM ECC memory (using 256-bit wide memory bus), and 64-bit PCI bus (33MHz). The ISA or EISA bus support could be implemented in the standard bridge, as usual. There was no built-in IDE controller (could be installed separately, using a third-party chip). The chipset consisted physically of 5 chips: 1 universal controller (including PCI bus support), and 4 data switches. Together with the beginning of EV56 production, they released a new modification of Alcor, which acquired BWX support. It was Alcor 2 (21172). The next member of this chipset dynasty was Pyxis (21174), a single-chip solution supporting 66MHz system bus and 66MHz SDRAM ECC memory (although, using 128-bit memory bus). For PCA57-based systems they also developed VLSI Polaris chipset.

The End of DEC

On January 26, 1998, the computer world heard the news that DEC being in critical financial condition was purchased by Compaq, and the deal was about to be approved by the upcoming shareholders meetings of both companies. DEC's shareholders ratified the agreement on February 2, 1998. The deal was 9.6 billion USD, compared to DEC's estimated market capitalization of about 7 billion USD. The integration of DEC's functional units into Compaq's business structure was finished in about half a year thus bringing DEC to its official end: its shares were taken off the New York Stock Exchange on June 11, 1998. I would like to draw your attention to the fact that the negotiations between DEC and Compaq started back in 1995, but led to nothing in 1996 because DEC's high management insisted on a merger, not an acquisition. Nevertheless, you have every right to ask: how could it happen that a huge company (according to the reports of 1989 it had almost 130 thousand employees, annual gross revenue of about $14 billion, i.e. it was the second company in the industry after IBM) with very high R&D potential and significant manufacturing facilities had to sell itself to a large system builder from Texas? There was no definite answer to this question, though the reasons mentioned were very numerous. Let’s talk a bit more about this.

Long time ago, Kenneth Olsen, a founder, president and CEO of DEC until almost the very end, said that well-engineered products would sell themselves. That said there is no need to do any advertising campaigns or other marketing promotion. He also mentioned, that there is no reason anyone would want a computer at home. Perhaps, these thoughts were correct in those " good old times", when computer equipment was manufactured in limited quantity by professionals and for professionals. But this is definitely not the case in the end of the XX century, when millions of computers were sold every year, and any mainstream computer could be assembled with just a screwdriver and the parts from the nearest computer shop within one hour max. Finally, you can even purchase the whole system already assembled in the same shop, with a free delivery to your door. And considering that this mainstream machine would be most likely purchased not by a professional manager, knowing what TCO (Total Cost of Ownership) means, but by some Mary the baker or little Johnny, who make no difference between a transistor and a resistor, such customers should not be motivated by the engineering advantages of their potential purchase. Mistake #1.

Even in the very beginning of the Alpha architecture history, DEC's high management made a great strategic mistake. As is known, first prototypes of EV4 were presented on a computer conference in February, 1991. Among others, there were engineers from Apple Computer, looking for new processor architecture to power company's new computers, and they were impressed by advantages of EV4. John Sculley, Apple's CEO, met with Kenneth Olsen in June of the same year, and offered him to use the DEC's new processor in future Macs. Olsen refused the offer, saying that the processor was not ready for the market yet and the VAX architecture hadn’t yet exhausted its potential. Several months after, rumors said that new Macs would be powered by PowerPC processors developed jointly by Apple, IBM and Motorola. William Demmer, a former VP of VAX and Alpha divisions who resigned in 1995, said later in his interview to the Business Week (April 28, 1997): "Ken did not want the company's future to run on Alpha." Mistake #2.

DEC manufactured Alpha processors, as well as accompanying chipsets and numerous peripherals, at its own factory in Hudson (Massachusetts). It designed and produced mainboards (limited number of modifications) exclusively for desktops (they were even called Evaluation Board or AlphaPC). Neither of these boards supported SMP, though almost all Alpha servers by DEC were multi-processor systems. Nevertheless, all mainboards were very well-engineered, though cost quite a lost, just like Alpha processors. Their layout circuitry was available for public access, so several companies (Aspen, Polywell, Enorex, etc.) manufactured fully qualified clones; the only company to develop its own design was DeskStation. In general, it could be stated for sure that DEC considered the production of their own workstations and servers their top priority, but did not take seriously the market of computer components for the same workstations and servers. It's possible to survive like that, but not to conquer the market. Mistake #3.

Despite all undertaken attempts, DEC didn't manage to make the pricing of its products (processors, chipsets, and mainboards in the first place) affordable for the majority of potential customers. For example, in the beginning of 1995, 266MHz and 300MHz EV5 were offered for $2,052 and $2,937 respectively, in 1,000-unit quantities, which was an enormous price, even taking into account the average manufacturing cost of $430 for a single MPR model. Considering the price per one "unit" of SPECint92, EV5 cost about twice as high as the competitor RISC designs! At the same time, a standard chipset for EV5, Alcor, was offered at a much lower price: $295 per piece in 5,000-unit quantities, though the only Alcor-based mainboard from DEC (EB164, with 1MB of B-cache) bundled with a processor and 16MB of main memory (by the way, that was not enough to run most applications even in those days) was offered for about $7,500. Mistake #4.

Although Alpha was declared an "open architecture" right from the start, there was no consortium for its further development. All R&D actions were handled by DEC itself, and sometimes in cooperation with Mitsubishi. In fact, though the architecture was free de jure, its most important hardware designs  were pretty much closed de facto, and had to be licensed with the following fee payments (if could be at all). So, it wasn't really helping to promote the architecture. Note that soon after the introduction of EV4, DEC's management offered Intel, Motorola, NEC, and Texas Instruments to license the production of their processor. But all these companies were involved in different projects already and EV4 was either of very little or completely no interest to them. So, they refused. Perhaps, the offered licensing conditions were also unacceptable, or maybe there was some other reason for that. Mistake #5.

After all, even the fastest computer without an operating system and accompanying software is just an expensive source of noise and an environmental heater. DEC targeted its Alpha hardware for Windows NT, Digital UNIX, and OpenVMS, in exactly this order of preference. Could be Ok, but...

Windows NT was an operating system designed for users, not for programmers (since it contained no integrated software development tools), that is why it depended significantly upon precompiled applications, commercial ones in the first place. In fact, the amount of Alpha-ready and i386-ready software titles differed by a few times. The launch of FX!32 in 1996 could have probably saved the situation in a way, as it was an excellent emulator and translator of x86 code to Alpha, developed by Anton Chernoff's team. However, unlike the applications natively compiled for Alpha, it caused about 40% performance drop. Then, there were drivers, and FX!32 was of no help there. So, since very few developers agreed to work on driver versions for Alpha, all the hopes could be pinned Microsoft and DEC only. Finally, Windows NT (3.51 as well as 4.0) was a 32-bit OS even when it worked on 64-bit Alpha hardware that is why it was unable to take the true advantage of the potential of the latter. But, all these issues didn't prevent DEC from promoting its Alpha systems with a slogan "Born to run Windows NT". All in all, this OS shouldn't have been positioned as the primary OS for Alpha architecture, though having it available as an option was a big plus for the architecture. Mistake #6.

 

OpenVMS and Digital UNIX (also known as DEC OSF/1, and later as Compaq Tru64 UNIX), two reliable and scalable commercial operating systems from DEC, didn't become very popular because of their high prices (for example, over $1,000 for one copy of Digital UNIX in 1997), and closed source code. Moreover, these operating systems were not free from a few other drawbacks (such as even more limited hardware support than that of Windows NT). If either of these OS’s was set free together with DEC's excellent software development tools, it could play a significant role for strengthening Alpha’s positions in the market. Mistake #7.

 

 

DEC didn't support free open-source operating systems, though the very first of them, NetBSD, was ported to Alpha in 1995, followed by Linux, OpenBSD and FreeBSD. It was strange, at least, because these OS’s were (and still are) very popular in Alpha environment. Besides, their market value was obvious even for those days, and was increasing constantly. Also these operating systems performed no worse than the commercial Digital UNIX or OpenVMS, guaranteed hardware support comparable to Windows NT (much better from the today’s prospective), and offered many other benefits you may expect from open-source software. Mistake #8.

 The list of DEC's strategic mistakes could be continued, including a complete disregard paid to the revolution of mainstream and budget personal computers, an over-diversified business model, and other less important and unrelated directly to Alpha architecture. So, we believe the final conclusion could sound as follows: DEC worked hard to make as much money as possible with Alpha architecture, but hardly did anything  to help the architecture itself.

DEC's numerous failures during the late 1980s and early 1990s motivated the board of directors to suspend Olsen from managing the company in June 1992. They appointed Robert Palmer to take Olsen’s former position. He undertook a desperate attempt to reorganize the company structure and management in 1994, turning existing "matrix" model (when functionally different departments were working closely together on every decision) into traditional "vertical" (with authorities and responsibilities defined clearly from the very top to the very bottom of the structure). From 1991 to 1994, DEC's net losses exceeded $4 billion, including $2 billion just from July 1993 to June 1994 (including $1.2 billion spent on restructurization). The number of employees was cut down to 85,000. According to Palmer's program, the company should get rid of many divisions considered non-priority, so this is when the global sale began. In July 1994, the Storage Business Unit manufacturing disk and tape drives was sold to Quantum for $400 million, soon after the first models of thin-film hard drives (RA90 and RA92) had suffered a complete fiasco (they were late for the market because of numerous design flaws, and didn't survive the competition). In August 1994, the Database Software Unit was sold to Oracle for $100 million, also 7.8% share in Italian Olivetti was redeemed for $140 million those days. In November 1997, a deal was arranged to transfer the Network Product Business Unit to Cabletron for $430 million.

The fall of DEC was loud enough. The company sued Intel in May 1997, accusing them of infringements of 10 patents for Alpha architecture during their work on Pentium, Pentium Pro and Pentium II processors. Intel started a lawsuit against DEC in September 1997, claiming its 14 patents had been dishonored during DEC’s work on Alpha processors. The peace was reached finally on October 27, 1997: both companies took their complaints back. DEC licensed to Intel the manufacturing rights on all its hardware available (except Alpha segment) and agreed to support future IA-64 architecture. Intel in its turn purchased from DEC the factory in Hudson accompanied with designing centers in Jerusalem (Israel) and Austin (Texas) for $625 million, and agreed to manufacture DEC's Alpha processors in the future. Additionally, a 10-year cross licensing agreement for patents was signed. The deal was closed on May 18, 1998. By that time, Compaq had adopted DEC's primary divisions, including 38,000 employees (before the acquisition Compaq had only 32,000 employees of their own), though many of them were laid off in the very near future.

I have to stress that shortly before DEC's end and soon after that, many leading DEC engineers left for other employers: Derrick Meyer joined AMD to design K7. James Keller also went to AMD, but as a K8 architect. Daniel Leibholz was hired by Sun to create UltraSPARC V. Richard Sites, one of the primary Alpha architects for all these years, also abandoned the ship. Intel was far not so lucky from this prospective: StrongARM architecture, inherited from DEC, seemed to be at a dead end, because no one of those chief architects who designed StrongARM-110, namely Daniel Dobberpuhl, Richard Witek, Gregory Hoeppner and Liam Madden, decided to join the new owner. Moreover, Witek's entire team, which was working in Austin on the second generation of StrongARM core, resigned completely, so Intel had to design the core literally from the scratch, involving their own engineers who were working on i960 before.

EV6, EV67, EV68C, EV68A

Although 21264 (EV6) processor was developed by DEC, and was mentioned first during Microprocessor Forum in October 1996, the final silicon implementations were completed only in February 1998, when DEC’s liquidation was already in full swing. The processor itself was a significant step forward compared to EV5, revolutionary in many aspects. One of the most important innovations was out-of-order execution, which implied a fundamental core redesign, and lowered dependence of the functional units on the cache and main memory bandwidth. EV6 could reorder up to 80 instructions on the fly, and that was much more than other competitive products could offer (say, Intel P6 architecture utilized out-of-order execution for up to 40 [micro-commands], HP PA-8x00 - up to 56, MIPS R12000 - up to 48, IBM Power3 - up to 32, and PowerPC G4 - up to 5; Sun UltraSPARC II didn't support instruction reordering at all). Out-order-execution was accompanied with register renaming technique, so there were 48 integer and 40 floating-point additional physical registers implemented (the number of logical registers, also referred to as programmable, remained unchanged).

The number of integer pipelines was increased to 4 (organized in 2 clusters), but they were somewhat different functionally: the 2nd pipeline could multiply (7 clocks per instruction) and shift (1 clock), the 4th could execute MVI code (3 clocks) and shift. Besides, all 4 pipelines supported elementary arithmetical and logical operations (1 clock). Every cluster featured an integer register file of its own (80 entries, like mentioned above), but they were identical (synchronized). The 1st and the 3rd pipelines also handled some tasks of the A-box, by calculating virtual addresses for load/store instructions. A-box itself worked with I-TLB and D-TLB (128 entries each), load and store queues (32 commands each), and 8 64-byte buffers (miss address file) for operations with B-cache and main memory. Floating-point pipelines were also functionally different: the 1st supported addition (4 clocks), division (12 clocks for single-precision and 15 clocks for double-precision), square root calculation (15 and 30 clocks), but the 2nd was only capable of multiplying (4 clocks). By the way, the square root calculation unit and all corresponding instructions were new to Alpha architecture. Just like in EV5, the decoder submitted up to 4 instructions per clock, and the scheduler distributed them between 2 queues: to integer pipelines (I-queue, 20 commands), and floating-point pipelines (F-queue, 15 commands). Besides the square root calculations, they also introduced prefetch instructions and commands for data transfer between integer and floating-point registers.

C-box was redesigned significantly: now it supported only 2 cache levels. The on-die L1 consisted of 64KB I-cache and 64KB D-cache, both 2-way set associative and with 64-byte lines. D-cache was write-back, though still was duplicated in B-cache. Because of large size and more complicated associativity policy, D-cache read/write latencies were increased to 3 clocks (to/from an integer register) and 4 clocks (to/from a floating-point register). D-cache remained dual-ported, though unlike EV5 it wasn't composed of 2 identical parts, but represented a single part clocked at twice the core frequency. External B-cache as big as 1MB-16MB, direct-mapped, write-back, used an independent 128-bit bidirectional data bus (with additional 16-bit ECC protection), and also an independent 20-bit unidirectional address bus. It consisted of LW SSRAM chips (late write), and after that of DDR SSRAM units (double data rate). B-cache working frequency could be set from 2/3 to 1/8 of the full core frequency, and unlike the previous generations of Alpha processors, now B-cache itself wasn't optional. The system data bus was only 64-bit wide (with additional 8-bit ECC protection), bidirectional, but used DDR technology. The system address bus was 44 bits wide, implemented physically through two 15-bit unidirectional channels, with no DDR support. The system control bus was 15 bits wide, and also did not support DDR. The basic working principle of the system bus was modified, so the bus became dedicated (instead of shared), so that every processor featured its own dedicated path to a chipset.

The branch prediction logic was redesigned completely. It followed a 2-level scheme: with a local history table of 1024 records 10-bit each, and a local predictor of 1024 records 3-bit each, also with a global predictor of 4096 records 3-bit each, and a history path of 12 bits. Both algorithms worked independently, and if the local traced every branch detected, so the global traced sequences of branches. The branch predictor analyzed results of both algorithms, and made conclusions to a separate choice predictor of 4096 records 2-bit each, which was the source of a preferred decision if the predictions were different. Such a cooperative approach allowed achieving better results than any of them both if used individually.

During EV6 development the clock frequency generator subsystem was redesigned entirely because the functional units were numerous and interacted with one another in a very complex manner. More efficient signal flow allowed the core to work at the frequencies of a much simpler EV56 core, while the same technical process was involved. Overall, power consumed by the EV6 clock subsystem made about 32% of the total core power. To give you a better idea let me make a few comparisons: for EV56 it was about 25% of the total core power, for EV5 - about 37%, for EV4 - about 40%.

EV6 was manufactured using the same 6-layer 0.35µ CMOS6 process, like EV56, and consisted of 15.2 mln transistors (including about 9 mln for I-cache, D-cache, and branch predictors). Its die was 314mm? big, and required 2.1V-2.3V power voltage. The core frequencies ranged from 466MHz to 600MHz (TDP approx. from 80W to 110W). The processor was designed in PGA-587 (Pin Grid Array) form-factor.

 

 

21264A (EV67) entered the market in the end of 1999. It was produced by Samsung with 0.25µ CMOS7 process, and featured a 210mm? die. It required lower power voltage of 2.0V. This solution boasted no architectural differences compared to EV6. The core frequencies ranged between 600MHz and 833MHz (TDP approx. from 70W to 100W), that allowed Alpha to bring back the leadership on integer tasks, which was lost not so long ago to Intel and AMD processors.

The first samples of 21264B (EV68C) were delivered in the beginning of 2000. They were produced by IBM using 0.18µ CMOS8 copper compound process. Despite the absence of any architectural differences, the promising technology allowed raising the core frequencies right to 1250MHz. In 2001, Samsung was able to manufacture in series 21264B (EV68A) using their own 0.18µ aluminum process, which allowed reducing the die size to 125mm? and the voltage to 1.7V. As a result, they managed to fit the core frequencies between 750MHz and 940MHz (TDP approx. from 60W to 75W).

  

Different sources mention 21264C and 21264D, codenamed EV68CB and EV68DC respectively, manufactured by IBM with the same technology as EV68C, and running at the same frequencies, so they could be considered their minor modifications. The only noticeable difference was a new form-factor, "pinless" CLGA-675 (Ceramic Land Grid Array) used instead of PGA-587.

There were 2 chipsets designed for 21264 series of processors: DEC Tsunami (21272; also known as Typhoon) and AMD Irongate (AMD-751). In fact, there could have been much more chipsets since both, 21264 and Athlon, used almost the same system bus (AMD licensed it from DEC).

DEC Tsunami was a highly scalable chipset. It could be used to build single-processor, as well as dual-processor and quad-processor systems, with the memory bus width from 128 to 512 bits (registered SDRAM with ECC, 83MHz), and supporting from one to several PCI buses (64-bit, 33MHz). They managed to achieve this flexibility by splitting the chipset into individual components: system bus controllers (C-chips, one per processor), memory bus controllers (D-chips, one per every 64 bits of bus width), and PCI bus controllers (P-chips, one per bus). So, no wonder, that some systems (for example, AlphaPC 264DP) had chipsets consisting of 12 separate chips...

Although AMD Irongate was developed to serve as a North Bridge for Athlon-based mainboards, it was used in some Alpha mainboards (for example, in UP1000 and UP1100). Being a single-chip solution, it cost much less than DEC Tsunami, and consumed much less power. But, it wasn't the best solution for 21264, because lacked multi-processor support and had a narrow memory path (64-bit, unbuffered SDRAM with ECC, 100MHz). Nevertheless, Irongate was the first chipset for Alpha to feature AGP bus support. And the last.

The Compaq Epoch

In fact, Compaq purchased the remains of DEC because of significant assembling facilities, its wide distributional network (in 98 countries), and that cross-licensing agreement with Intel (for example, allowing to manufacture 8-processor Profusion servers). Later on it turned out that the division developing Alpha architecture wasn't welcome really: Compaq produced workstations and servers based on Intel processors for a very long time, and also paid special attention to AMD products. So, Compaq established an alliance with Samsung in June 1998, to develop Alpha architecture (by the way, DEC and Samsung signed an agreement in February 1998, according to which Samsung received full access to all Alpha-related patents, and could manufacture already developed Alpha processors as well as design new ones on their own). They established a new joint venture aka API (Alpha Processor Inc.), to promote the architecture (someone must have made right conclusions from DEC's previous experience). In the summer of 1998, EV6-based systems entered mass production stage, featuring the best price-to-performance ratios compared to other competing products available on the market. Serious problems with the upcoming Intel Itanium gave every reason to assume that the situation would remain like that in the near future. Besides Samsung, EV6 processors were also manufactured by Intel in their Hudson Fab-6, according to the final agreement with the former DEC...

In 1999 Compaq suffered some significant sales drops in the personal computer market. The most frequently named reason was an underestimation of possibilities given by the Internet to promote and sell PCs. Dell, in fact, adapted its business model accordingly and offered computer equipment priced most attractively among all top brands. Compaq's CEO, Eckhard Pfeiffer, resigned after a financial disaster in Q1 1999. Trying to reduce losses, Compaq started to minimize its presence in certain areas, and that affected Alpha systems: in May 1999, the AlphaServers assembly line in Salem (New Hampshire) was officially shut down.

On August 23, 1999, a notorious event took place: Compaq stopped their participation in the development of Windows NT, stopped supplying this OS with Alpha systems of its own, and, in fact, laid off a team of about 120 programmers from former Western Research Laboratory of DEC (DECwest) working on this project. According to Compaq's statistics on the OS’s preinstalled on Alpha systems, Tru64 was used in 65% of the systems, OpenVMS - in 35%, and Windows NT in just about 5%, so there was no reason to keep flogging a dead horse. A week after, Microsoft announced that there would be no Windows 2000 for Alpha released. Considering the fact that Microsoft gave up the support of PowerPC and MIPS architectures in 1997, the future of a "universal OS" was tied to a single architecture, besides IA-64...

To ensure the leadership of Alpha architecture in the future Compaq and Samsung signed a memorandum in December 1999. Both parties agreed to invest the total of 500 mln USD in the architecture (Samsung had to invest 200 mln USD into the development and tuning of new technological processes, and Compaq was supposed to invest 300 mln USD into new server solutions design and further Tru64 UNIX development). Also, in the same month Compaq and IBM agreed that the latter would manufacture Alpha processors using copper compound technology of its own, when this technology was completed. At the same time, Samsung would still remain the primary supplier of Alpha processors. Summing up the annual results for Compaq, they could be best illustrated by price per share delta: the price fell from $51 in February to $28 in December. Though many analysts stated it could have been much worse.

Y2K was a quiet year for Compaq. Samsung failed to finalize its 0.18µ aluminum process. IBM, however, started supplying EV68C to Compaq in limited quantities, and the market had to be happy with considerably slow EV67 for a while. The development of 21364 was still in progress (EV7, also known as Marvel), though 21464 (EV8, also known as Ara?a) had already been mentioned in a few announcements. The collapse of dot-com businesses affected Compaq's shares, which price dropped down to $15 per share by December, i.e. by 44% since January. Could be strange, but that was a good result; other companies, more dependable upon e-commerce, lost much more: Gateway - 75%, Apple - 71%, Dell - 65%. Dot-coms themselves were either bankrupts, or close to that; Yahoo.com lost 95% of its market value, Priceline.com - 97%.

In the beginning of 2001, Samsung started to manufacture EV68A in mass quantity, but the right moment had passed by. Compaq planned to ship EV68C-based systems (GS-class AlphaServers), and to upgrade those already in production. EV7 was still somewhere, when something completely unexpected happened: on June 25, 2001 ("black Monday"), Compaq announced the complete shift of their server solutions from Alpha to IA-64 architecture by 2004. EV8 was cancelled immediately (though some details about its working principles were discussed during Microprocessor Forum in October 1999), and EV7 was scheduled to come out in the beginning of 2002 at the earliest. After that the Alpha Microprocessor Division had to be disbanded, and most of its employees had to move to Intel. Samsung and IBM stopped producing Alpha processors soon. Later, it was even more interesting: on September 3, 2001, Hewlett-Packard announced its intentions to acquire Compaq, which experienced some financial difficulties: its price-per-share value equaled $10 in December 2001. The deal was approved by shareholders' meetings of both corporations, also by governments of the USA and Canada, and closed in May 2002.

On October 21, 2001, API (renamed by that moment to API NetWorks) transferred all rights to support Alpha systems (including warranty service) to Microway, the largest [after Compaq] builder of Alpha workstations and servers, an old partner of former DEC. API itself left the Alpha products market, and concentrated its efforts on network technologies, development of HyperTransport bus, and data storage systems.

In conclusion I could say that though Compaq avoided many of those DEC’s mistakes, it still didn't unveil all power of the architecture. High-performance Alpha systems based on 21264A and 21264B didn't hit the sub-$2000 price category, and low-cost 21264PC never appeared at all. Low-cost mainboards on AMD Irongate never appeared in volume, and expensive DEC Tsunami, offered by Compaq for over $1000 per unit in OEM quantities, prevented Alpha systems from entering the mainstream market segment. Other AMD Athlon chipset manufacturers didn't adapt them for 21264, though VIA had such an intention initially.

EV7, EV79, EV7z, EV8

The first piece of news on the 21364 (EV7) architecture came from Microprocessor Forum in October 1998. It said, that the processor would be based on the EV6core, but with an integrated Direct Rambus DRAM controller (presumably, 4-channel), and an L2 cache (1.5MB 6-way set associative). They also mentioned that there was no intention to modify the EV6 core, though there could be another reason for that: no one could handle this hard task, because there were not so many developing engineers left at Compaq. The design was expected to be completed by 2000.

Having acquired Compaq, HP inherited Alpha architecture, which was hardly that interesting to them at this point, because they were working on their own 64-bit PA-RISC architecture (Precision Architecture RISC), and held the alliance with Intel to develop IA-64 architecture (i.e. Itanium). So, HP's actions regarding Alpha architecture were limited to selling EV6/EV67/EV68-based servers inherited from Compaq, and launching EV7 into production, presented finally in January 2002.

As we have expected, EV7 featured the core of EV68 (absolutely unchanged), and several units integrated additionally: two memory controllers (two Z-boxes, for Direct Rambus DRAM PC800), a multi-functional router (R-box, for multi-processor support and networking), and a full-speed L2 cache (S-cache, 1.75MB 7-way set associative). The S-cache bus was 128-bit wide, and the cache itself worked with significant latencies (12 clocks for reading). Both Z-boxes and R-box were clocked at 2/3 of the core frequency. Memory channels speed depended on Z-boxes and equaled to a half of their frequency (1/3 of the core frequency, respectively), however it used DDR technology.

Every Z-box supported 5 memory channels (4 primary and 1 auxiliary), each 18-bit wide (16bit for commands/data/addresses, 2bit for ECC). The auxiliary channel was optional, and could be used to organize a failure-tolerant array in memory (roughly speaking, like RAID3). For example, when writing a quadword (64 bits) to memory it was divided into 4 words (16 bits), each of them was sent through a dedicated channel, and the auxiliary was used to store a checksum. Also, every Z-box could have up to 1024 memory pages opened. The total theoretical memory bandwidth of one EV7 was about 12Gb/s. Obviously, since every EV7 in a multi-processor system had a memory area of its own, such a memory model was called NUMA (Non-Uniform Memory Access), unlike traditional SMP (Symmetric Multi-Processing), when all processors installed into the system had access to a single (common) memory area. Thus, every processor in this system (128 maximum) could access memory through controllers of its own as well as through other processor controllers. R-box fulfilled a communicative function between processors, also between a particular processor and local peripherals. It supported 4 independent channels with a theoretical bandwidth of 6Gb/s each (one per every next processor connected), and 1 additional channel for high-speed input/output transfers.

Since EV7 inherited internally all the interfaces of EV6, the processor should have had a unit supporting the system bus interface of EV6. Although this part of the processor design wasn’t mentioned or documented anywhere, we still can make some assumptions about its performance. Since the minimal operating bus multiplier supported by EV6 equaled 3, the theoretical bandwidth of the bus leading to this unit was 3Gb/s for EV7. Note that it was 4 times lower than both Z-boxes could deliver together. It was a serious argument in favor of the EV7 initial application: high-end multi-processor systems.

EV7 processors could be connected to each other using various algorithms, but only the so-called "torus" and "shuffle" interconnects were implemented in real hardware. Also, the second one was more effective potentially in some situations (for example, considering 8-processor systems, "shuffle" allowed each processor to be connected directly to 4 others, while "torus" - to 3 other processors only). Of course, this difference didn’t matter any more in 12-processor systems.

This processor was manufactured with 7-layer 0.18µ CMOS8 process, consisted of 152 mln transistors (including 137 mln for I-cache, D-cache and S-cache), and therefore featured a very large die (397mm?). Prototypes were clocked at 1250MH (TDP of 155W), though those processors installed in systems produced by HP were running at 1000MHz to 1150MHz. From an engineering point of view, EV7 yielded significantly to the previous representatives of the Alpha architecture in terms of the density of functional units placed on the die. Of course, it affected the maximal core frequencies it could work at, the latencies of S-cache, and, hence, performance.

In December 2002, HP sent out a press-release saying that the first EV7-based servers would be available in January 2003. Later, EV79 would be produced (using 0.13µ SOI process), and this is when the Alpha architecture development should end. In March 2003 at ISSCC'2003 they presented a prototype of EV79 with a 251mm? die requiring 1.2V power, and clocked at 1450MHz (TDP of 100W). But in October 2003 some rumors about manufacturing difficulties sneaked out of IBM, and half a year later the processor was finally cancelled.

In August 2004, the last Alpha processor was announced. It was EV7z clocked at 1300MHz, manufactured with the same 0.18µ process. Like EV7, it was designed for HP's products only. They also mentioned that Alpha architecture based servers and workstations will keep selling under HP brand name until 2006, and will be supported until 2011, but no longer than that.

The cancelled 21464 (EV8) was supposed to be a successor to EV7, with twice as many primary functional units (8 integer and 4 floating-point pipelines) and 3MB S-cache. It was also supposed to support the new SMT technology (Simultaneous Multi-Threading), which implied a concurrent execution of up to 4 software streams inside a single core (maybe, this technology was related somehow to HyperThreading from Intel). The die manufactured with 0.13µ SOI process size equaled 420mm?.

Epilogue

When we were working on this article Alpha systems were still offered, mostly through HP and Microway. The latter even listed relatively inexpensive workstations based on 21164A and AlphaPC 164LX for Linux ($2000 for a standard configuration). Many retired, but still working workstations and servers, as well as parts, were offered by "online flea markets". Most of those systems were working under Windows NT, and many of them would accept neither Digital UNIX nor OpenVMS, and some even *BSD (systems with no SRM console available), though it could be still possible to install Linux from ARC/AlphaBIOS. If you have any intention to purchase an Alpha system, clarify this question before spending your money. This will save you a lot of trouble later.

According to the statistics, DEC and Compaq sold about 800K of Alpha workstations and servers until June 2001. There is no exact number showing how many systems have been assembled and sold by other companies, but it is estimated to be over 500K.

Many people say that Alpha architecture died a natural death. Hope, this article will show you clearly that it is not true, and Alpha architecture was simply buried alive. Because it was better this way.

There were many cases in history, when a poorly crafted product prevailed over a better one. Maybe, the first product cost much less than the second. Also possible, the second product was promoted not so actively in the market. Or license fees were incomparable. Everything could be possible. Some would admit that the fear of losing the job stimulates marketing people to act very aggressively sometimes even if the product they push into the market is not so bright. One thing is truly evident: the technical specifications of the product are often not the top priority aspect on the way to the market success.

Life goes on...

Bibliography List:

1. Rich Witek, Dick Sites. Alpha Architecture Technical Summary, 1992.

2. Richard L. Sites. Alpha AXP Architecture, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.

3. Daniel W. Dobberpuhl, and others. A 200-MHz 64-bit Dual-issue CMOS Microprocessor, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.

4. Edward McLellan. The Alpha AXP Architecture and 21064 Processor, IEEE Micro, 1993.

5. Dina L. McKinney, and others. Digital's DECchip 21066: The First Cost-focused Alpha AXP chip, Digital Technical Journal, 1994.

6. Robert Couranz. The E2COTS System and Alpha AXP Technology: The New Computer Standard for Military Use, Digital Technical Journal, Vol. 6, No. 2, 1994.

7. Samyojita A. Nadkarni, and others. Development of Digital's PCI Chip Sets and Evaluation Kit for the DECchip 21064 Microprocessor, Digital Technical Journal, Vol. 6, No. 2, 1994.

8. Linley Gwennap. Digital Leads the Pack with 21164, Microprocessor Report, Vol. 8, No. 12, 1994.

9. William J. Bowhill, and others. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU, Digital Technical Journal, Vol. 7, No. 1, 1995.

10. David P. Hunter, Eric B. Betts. Measured Effects of Adding Byte and Word Instructions to the Alpha Architecture, Digital Technical Journal, Vol. 8, No. 4, 1996.

11. Linley Gwennap. Digital, MIPS Add Multimedia Extensions, Microprocessor Report, Vol. 10, No. 15, 1996.

12. Daniel Leibholz, Rahul Razdan. The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor, Proceedings of IEEE COMPCON'97, 1997.

13. Michael K. Gowan, Larry L. Biro, Daniel B. Jackson. Power Considerations in the Design of the Alpha 21264 Microprocessor, DAC 98, June 15-19, 1998.

14. Linley Gwennap. Compaq, Intel Fight Digital Brain Drain, Microprocessor Report, Vol. 12, No. 14, October 26, 1998.

15. Linley Gwennap. Alpha 21364 to Ease Memory Bottleneck, Microprocessor Report, Vol. 12, No. 14, October 26, 1998.

16. M. Matson, and others. Circuit Implementation of a 600 MHz Superscalar RISC Microprocessor, Compaq Technology Journal, 1998.

17. Chart Watch: Workstation Processors, Microprocessor Report, May 10, 1999.

18. Daniel W. Bailey. High-Performance Alpha Microprocessor Design, Compaq Computer Corporation, 1999.

19. Exploring Alpha Power for Technical Computing, Compaq Technology Brief, April 2000.

20. Zarka Cvetanovic. Performance Analysis of the Alpha 21364-based HP GS1280 Multiprocessor, Hewlett-Packard Corporation, 2002.

21. Kevin Krewell. Alpha EV7 Processor: A High-Performance Tradition Continues, Microprocessor Report, April 5, 2002.

22. Ronald P. Preston. Design of an 8-wide Superscalar RISC Microprocessor with Simultaneous Multithreading, Compaq Computer Corporation, ISSCC Report, 2002.

23. Peter N. Glaskowsky. Moore, Moore, and More at ISSCC, Microprocessor Report, March 23, 2003.

We also used numerous technical documentation by DEC and Compaq.

Special Thanks To:

We would like to thank Wikipedia for the information about DEC's early history as well as products of those old days. Also special thanks go to Terry Shannon for his regular and informative newsletter, "Shannon Knows {DEC, Compaq, HPC}"

This paper contains information collected from many unofficial Internet-resources, the full list of which is too long too to be placed here. We would like to thank people from the online community for the interesting facts they shared with us, their comments, points of view, etc.

The photographs of EV4 and EV6 are courtesy of cpu-collector.com

The author would like to thank personally ISA_user, VLev, Yury_Malich, Stranger_NN, and of course, matik for their valuable contribution to the article!