That is why we need a set of auxiliary registers. And this set should be pretty large, so that we could use these registers without very strict limitations (in the Pentium 4 processor there are 128 registers like that). And since no x86 programs know about more than 8 general purpose registers, we need to teach these auxiliary registers pretend as if they were the general purpose ones. So, we need a special unit, which is actually called register renaming unit. In fact, it handles only one thing: when any of the tasks needs a general purpose register (for instance, register “A”), this unit takes the first free register, names it “A”, and allows the task to store the results in it when the operation is completed. And when the time comes (according to the order specified by the initial program code) this unit has to take the result from this register (thus making it free again) and send it where necessary.
The same thing is happening in the Pentium 4 processor: if certain tasks can be completed out-of-order, the managing logics performs them as if in draft and saves the results in the special registers. Later when all the operations have been completed, the logical unit will save all the results in the exact order assigned by the program. Out-of-order commands execution is another important responsibility of the Back End block.
Now let’s pass over to Front End, a group of units responsible for decoding and storing the instructions.
First of all, let me describe the problem the CPU developers faced. We have already understood that we need to make each pipeline stage as simple as possible in order to increase the working frequency. However, the irregular structure and different complexity of x86 instructions appear an obstacle here: x86 instructions are of different length, require different number of operands and even different syntax. Even if two instructions are of the same length they can still require completely different effort and resources to be executed. There are two ways-out here: we can either make the execution units “smart” so that they could work with any instruction on their own, or we could try make the instructions arriving to Back End much simpler.
The first way-out automatically results in lower working frequency, because each functional unit will handle the instructions on its own. Note that really complex instructions occur very rarely, so this complicated design of the execution units would be none other but a waste of resources.
The second way-out implies that we need a unit which will turn the x86 instructions of different length (up to 15 bytes!) and complexity into simple and easy to understand commands for the execution units. The modified instructions should boast regular structure and should suit for flawless execution in very simple functional units. By the regular structure we mean the following: the same instruction length, standard representation form (location of operands and service labels), almost the same execution complexity. If the x86 command cannot be represented as a single simple instruction, then it can be transformed into a chain of simple instructions.