Let’s try to determine what the performance of a processor may depend upon, without sticking to any particular implementation. In the long run, software for any processor is reduced to a set of elementary arithmetic and logical operations. Thus, the processor will be faster if it can perform more such operations per a certain period of time.
The amount of calculations the processor can make depends on two basic things: how many elementary operations the processor performs in one clock cycle and what frequency it works at. The number of calculations is then estimated by multiplying the frequency by the number of operations per clock cycle. Accordingly, we have two ways to higher performance: either create an architecture oriented at making more work in a clock cycle (processors with “heavy” clock) or boost the frequency to the maximum. Of course, in the ideal world, we would do both things at a time, but it seldom happens in reality. As a result, the architecture developer team usually faces the problem of choosing the priority direction.
Many developers of server processors chose the variant with a “heavy” clock. In other words, the priority is to execute as many operations within a clock cycle as possible. The manufacturers go for various tricks to achieve this. For example, they integrate two processor cores into one die as IBM did in its Power4. Intel chose two directions in its two server processor series: the Xeon is rather a representative of frequency-growth-oriented CPUs, while the Itanium family includes processors that should do a lot of work per cycle. The frequencies as well as performance of CPUs from the two series differ greatly and the lower-clocked Itanium has an advantage. Both directions involve certain difficulties, though.
There are serious obstacles on the way to further increase of the work performed per cycle. There are many algorithms (in some areas they form a majority of algorithms) that are not easily paralleled, being sequential in their nature. Thus, it takes some effort – recompilation of software – to load with work as many execution units of the processor as possible. Technologies like Hyper-Threading from Intel serve the same purpose (they load idle units of the CPU with work). We will see below that a majority of RISC processors remained at the level of 6 pipelines – more pipelines don’t provide any performance gains. The only exception is the Itanium family, but at a high cost: performance of this processor heavily depends on the quality of the compiler. Let’s summarize: the increase in work performed per clock cycle is limited by the nature of many existing algorithms.
Frequency growth also brings about a series of unpleasant surprises and predictable consequences. It results in the processor’s higher power consumption and heat dissipation. Then, any circuit – especially the sophisticated circuitry of a modern processor - has a frequency ceiling. This ceiling is mainly because of the necessity to synchronize different processor units. There’s always a tiny difference between the operational timings of different units of the CPU and the units start working out of sync at a certain frequency. This frequency is the limit, conditioned by the architecture of the particular processor.