EV7, EV79, EV7z, EV8
The first piece of news on the 21364 (EV7) architecture came from Microprocessor Forum in October 1998. It said, that the processor would be based on the EV6core, but with an integrated Direct Rambus DRAM controller (presumably, 4-channel), and an L2 cache (1.5MB 6-way set associative). They also mentioned that there was no intention to modify the EV6 core, though there could be another reason for that: no one could handle this hard task, because there were not so many developing engineers left at Compaq. The design was expected to be completed by 2000.
Having acquired Compaq, HP inherited Alpha architecture, which was hardly that interesting to them at this point, because they were working on their own 64-bit PA-RISC architecture (Precision Architecture RISC), and held the alliance with Intel to develop IA-64 architecture (i.e. Itanium). So, HP's actions regarding Alpha architecture were limited to selling EV6/EV67/EV68-based servers inherited from Compaq, and launching EV7 into production, presented finally in January 2002.
As we have expected, EV7 featured the core of EV68 (absolutely unchanged), and several units integrated additionally: two memory controllers (two Z-boxes, for Direct Rambus DRAM PC800), a multi-functional router (R-box, for multi-processor support and networking), and a full-speed L2 cache (S-cache, 1.75MB 7-way set associative). The S-cache bus was 128-bit wide, and the cache itself worked with significant latencies (12 clocks for reading). Both Z-boxes and R-box were clocked at 2/3 of the core frequency. Memory channels speed depended on Z-boxes and equaled to a half of their frequency (1/3 of the core frequency, respectively), however it used DDR technology.
Every Z-box supported 5 memory channels (4 primary and 1 auxiliary), each 18-bit wide (16bit for commands/data/addresses, 2bit for ECC). The auxiliary channel was optional, and could be used to organize a failure-tolerant array in memory (roughly speaking, like RAID3). For example, when writing a quadword (64 bits) to memory it was divided into 4 words (16 bits), each of them was sent through a dedicated channel, and the auxiliary was used to store a checksum. Also, every Z-box could have up to 1024 memory pages opened. The total theoretical memory bandwidth of one EV7 was about 12Gb/s. Obviously, since every EV7 in a multi-processor system had a memory area of its own, such a memory model was called NUMA (Non-Uniform Memory Access), unlike traditional SMP (Symmetric Multi-Processing), when all processors installed into the system had access to a single (common) memory area. Thus, every processor in this system (128 maximum) could access memory through controllers of its own as well as through other processor controllers. R-box fulfilled a communicative function between processors, also between a particular processor and local peripherals. It supported 4 independent channels with a theoretical bandwidth of 6Gb/s each (one per every next processor connected), and 1 additional channel for high-speed input/output transfers.
Since EV7 inherited internally all the interfaces of EV6, the processor should have had a unit supporting the system bus interface of EV6. Although this part of the processor design wasn’t mentioned or documented anywhere, we still can make some assumptions about its performance. Since the minimal operating bus multiplier supported by EV6 equaled 3, the theoretical bandwidth of the bus leading to this unit was 3Gb/s for EV7. Note that it was 4 times lower than both Z-boxes could deliver together. It was a serious argument in favor of the EV7 initial application: high-end multi-processor systems.



