Chapter IX: Replay System Indepth
Before we pass over to the discussion of the freshly revealed phenomenon called replay, we suggest revising a few important things. It will really help us later.
First of all, you all know that the performance of an abstract CPU depends a lot on the density of the commands flow fed into its execution units. Ideally the execution units should perform some effective work every single processor clock. So, the primary goal of the pipeline managing logics is to load the processor execution units to their maximum capacity.
In our particular case, in NetBurst micro-architecture, it means the following: the scheduler loading the micro-operations into the execution units should do everything possible to reduce their idling. Remember this conclusion, it will help us later.
This was an introduction. Now let’s approach the topic of our discussion from a bit different side. Let’s imagine a pipeline, where the scheduler is located right in front of the execution units. So, when the scheduler sends a micro-operation for execution, it grabs the corresponding operands and gets processed. Everything is beautiful! If the next micro-operation needs the results of the previous one, it will be able to grab them as soon as they are ready. If the micro-operation needs the results of the previous command as its operands, the scheduler will be able to send it to the execution units after a certain time interval. This time interval is determined by the time the first operation requires for execution (to be more exact, for the result of this execution to be ready for further processing).
Everything is just fine if the scheduler is right next to the execution unit. But as we remember, that one of the key goals of the NetBurst micro-architecture was to increase the processor working frequencies. As a result the pipeline got longer. So, there appear a few more stages between the scheduler and the execution unit. They are no longer next to one another.
Actually, this is not such a big problem: the micro-operations can be sent for execution in advance, taking into account the additional pipeline stages that appeared on the way. In other words, the scheduler should send out the micro-operation a few clock cycles earlier, to cover the distance between itself and the execution units right in time. It means that the commands should be sent out in advance, before the result of the previous micro-operation is available.
What’s the result of this measure? When the scheduler releases the first micro-operation, it will select the next one from the queue on the second clock cycle already. After the second micro-operation is sent along the pipeline, it will tackle the third one. And so on and so forth: the scheduler will release micro-operation onto the pipeline, the execution units will execute them, which keeps the entire pipeline busy. By the time the first micro-operation is about to be executed, the next micro-operations are already coming close taking different pipeline stages, and the scheduler is busy working on the next micro-operation already.