EV4, LCA4, EV45, LCA45
The first processor of Alpha family was called 21064 (21 pointed that Alpha was an architecture of the XXI century, 0 - the processor generation, 64 - computational capability in bits). It was also code-named EV4 (EV was [supposedly] the abbreviation of "Extended VAX", and 4 - technical process generation, CMOS4, where CMOS stood for "Complementary Metal Oxide Semiconductor"). I have to point out that EV4 prototype was ready in 1991, using CMOS3 process that is why it featured smaller caches and no floating-point unit. Nevertheless, it was an important threshold for tuning and polishing off the architecture and software environment. EV4 was introduced in November 1992, and was manufactured using advanced for those days 3-layer 0.75µ technical process (in the future, it was modified towards 0.675µ CMOS4S, the optical modification of CMOS4). It was designed for 3.3V voltage and core frequencies ranging from 150MHz to 200MHz (TDP from 21W to 27W). The solution consisted of 1.68 mln transistors, and featured a 233mm² die. EV4 supported multi-processing, which was one of the key features of that architecture. It was designed in PGA-431 (Pin Grid Array) form-factor.
The L1 cache was integrated: 8KB for instructions (I-cache, instruction cache), direct-mapped, and 8KB for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 clocks. Every line of I-cache consisted of 32 bytes of instructions, a 21-bit tag record, an 8-bit branch history field, and of several auxiliary fields. Every line of D-cache consisted of 32 bytes of data and a 21-bit tag record. The L2 cache (B-cache, backup cache) was a recommended option, using external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead, sized up to 16MB (usually from 512KB to 2MB). Every line consisted of 32 bytes of data or instructions with a 1-bit longword parity or 7-bit longword ECC field, a 17-bit maximum tag record with an additional 1-bit longword parity protection, and a 3-bit status flag with an additional parity bit. Read and write speed of B-cache was programmable, and measured in processor clocks. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit longword parity or 7-bit longword ECC field), and was multiplexed with B-cache data bus, switching if necessary. The system address bus was 34-bit wide. B-cache was organized to be inclusive to D-cache, i.e. contained a full copy of the latter. A mechanism called “victim write” was used to store data from B-cache to the memory. Only processor could perform read/write operations with B-cache, the system logic could only read B-tag data (that was of the top importance for multi-processor systems especially, to maintain cache coherence of all processors available within a machine).
The processor was powered with one integer pipeline (E-box, 7 stages), and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per clock to the execution units, namely E-box, F-box, and load/store unit (A-box). The cache and system bus controller (C-box) worked in cooperation with A-box, and supervised both: integrated I-cache and D-cache, as well as external B-cache. The branch prediction unit maintained a 4096-entry branch prediction table, 2 bits per entry. There were I-TLB of 8 entries for 8KB pages and 4 entries for 4MB pages, and D-TLB of 32 entries; all fully-associative.

Despite its excellent performance, EV4 was quite expensive for most potential customers, and thus its low-priced brother was released in September 1993. It was 21066 (LCA4, or LCA4S). It was based on EV4 core, but with additional integrated memory and PCI controllers, as well as several secondary functions. Although I have to stress that the system data bus width was reduced to 64 bits, which affected the performance in a negative way. LCA4 was manufactured using 0.675µ CMOS4S process, with the die size even smaller than that of EV4 (209mm² compared to 234mm²). It also worked at lower clock frequencies: 100MHz - 166MHz, presumably to avoid potential overheating issues for poorly ventilated desktop cases of those days. Besides they also tried to avoid creating an additional competitor to EV4. The newcomer contained 1.75 mln transistors, and required 3.3V voltage. The design of this processor was licensed to Mitsubishi, and they also manufactured LCA4 (including a 200MHz version).





