We’ve reached microprocessors manufactured by Sun Microsystems. That’s a well-known and respectable company, occupying one of the top positions in the server, workstation field. Sun relies on 64-bit processors of their own development – the UltraSPARC III (and the IIIi model, which differs with its integrated four-channel 1MB L2 cache).
The UltraSPARC III contains about 29 million transistors (the IIIi model of course has more due to its 1MB L2 cache – something like 85 million transistors). This architecture is designed to perform four instructions per clock cycle (the maximum rate of fetching instructions from the cache). The processor contains a 64KB four-channel L1 cache for instructions and a 64KB four-channel L1 cache for data. The L1 cache is quite fast, the access latency is only two cycles – good for that size. Execution units: two 64-bit ALUs, one branch-prediction unit (its branch history table remembers as many as 16K of previous branches), one load/store unit, two floating-point units (one performs addition/subtraction, another – multiplication/division). It also contains a special 2K buffer for writing preliminary data (I’ll explain to you its role shortly). The UltraSPARC II model works with an external cache up to 8MB, and the tags of the L2 cache are located in the processor for faster processing. The external cache works at one fourth of the CPU frequency (300MHz for a 1200MHz CPU).
The processor doesn’t support out-of-order execution. The instruction buffer is 16 instructions long and they are all waiting for the appropriate execution units to become free. Of course, the performance degrades without such a mechanism, but the UltraSPARC III has something else instead.
First of all, the result of an operation is available at the stage, which is the next after the result is achieved, rather than after the passing of the entire pipeline. For example, we have got a result of multiplication of two numbers at stage 8. The next command that uses this result won’t wait for 6 cycles more, but will go for execution in the next cycle. This opportunity arises due to a register file hidden from programmers – its auxiliary registers store the intermediary results.
Secondly, when processing branches in a special buffer, the processor handles the most probable branch and saves up to four commands from the alternative branch. Thus, the incorrect prediction allows the processor to continue working without waiting for the code to be loaded from the memory.The length of the pipeline is 14 stages; the maximum frequency of the UltraSPARC III – 1200MHz, of the IIIi model – 1280MHz. By the way, the clock rate is not very high considering the length of the pipeline and the 130nm technology with copper interconnects.