Now let’s dwell upon the Itanium micro-architecture. As I have mentioned above, some part of logic, which would otherwise take space in the die (and which is less scalable in frequency than other chip units) has been taken out, into the compiler. That’s the more appropriate since the Itanium doesn’t (yet?) use sophisticated methods for improving performance like out-of-order execution. As a result, the processor contains six (!) integer pipelines with three branch-prediction units, two FPU pipelines, one SIMD unit with SSE2 support, two load units, three branch processing units (see below) and two store units. The processor is designed to perform six operations per clock cycle. Accordingly, the die can boast "grownup" dimensions (up to 374 sq. mm in the Madison core) and a huge amount of transistors, 221 million for a model with 1.5MB cache (and about half a billion for the Madison with 6MB L3 cache).
This construction is supported by a cache hierarchy of an appropriate size: a 16KB+16KB L1 cache for data and instructions (its latency is only 1 cycle!), 256KB L2 cache, up to 6MB L3 cache. By the way, there will be a variant of the processor with 9MB L3 cache! Larger caches will come after that: Intel promises to expand the cache to 12MB and then to 24MB per processor in the future!
By the way, the Itanium architecture may be the most cache-dependant of all. First, 64-bit code and data take more space than 32-bit code and, second, the shared-bus architecture (Intel clings to it even in hi-end systems) shows better results when processors have more cache memory. Besides that, some systems (like those on the Summit chipset from IBM) also have an L4 cache (!) for increasing the overall system performance (the Summit contains up to 64MB of L4 cache).
Curiously enough, the two server processor series from Intel - Xeon and Itanium - use different means for performance increase. Particularly, the Xeon stresses the frequency growth. Its pipeline is made longer just for that: for example, the pipeline grew from 10 to 20 stages on the transition from the Pentium III Xeon to the Xeon (on the Pentium 4 core). Intel went in the opposite direction with the Itanium: the pipeline length diminished from 10 to 8 stages during the transition from the first version of the processor to the second! Cache latencies were also reduced.
Although the Itanium 2 (that’s the official of the second version, shipped currently) may be simpler than other processors in some aspects (this follows from the very concept of the EPIC - taking as many of processor logic out into the compiler as possible), it has some points of interest. The conception, by the way, implies that the processor performance greatly depends on the work of the compiler - sometimes more than on the architectural traits or the frequency.