Say we have a hole between micro-operations in the replay. When they return to multiplexer after the replay (complete the first replay loop), there appears an opportunity to release for execution another micro-operation instead of the hole. And the scheduler cannot miss this opportunity for sure: its major goal is to load the execution units as efficiently as possible. And it is actually not that much the scheduler’s enthusiasm, but its complete unawareness of the situation with earlier released micro-operations execution status, so it simply follows the above described optimistic strategy blindly.
But it is all not that simple. It happens so that the next command is another dependent micro-operation from our dependency chain. But it needs the results of those micro-operations that were in front of it in the initial program code. And now it turned out ahead of these commands on the way to the execution units, because these commands were circling around the replay pipeline! Of course, it arrives to the execution unit before the data it needs is ready, it gets executed incorrectly, and will be sent to replay.
So, on the next replay loop, when the beginning of our uop-s chain has been executed with correct operands, we will be witnessing a mirror image of what we have just described: a lot of empty positions in the replay loop and one command sent there too early. And the “enthusiastic’ scheduler will immediately fill all the empty positions of the replay loop with new micro-operations, including those positions that were before our micro-operation and were intended to follow it, and not to precede!
I believe most readers have already got the point here… Absolutely correct: the entire micro-operations chain will go to replay. Our command will finally be executed, and all other commands will create a familiar situation going through the replay: a replay loop with micro-operations and a hole between them. The circle is closed: the scheduler continues sending new commands from the dependency chain to the execution units fitting them into every available hole between the micro-operations returned to the replay system, however, it has no idea what commands have already been executed successfully.
So, what do we get in the long run? It turns out that the entire micro-operations dependency chain should go through replay. For a more illustrative example, imagine that this dependency chain is a metal chain of small units linked together. Wrap it around a pen, so that there is a loop, and then pull one end of it. What do you see? The entire chain will slide around the pen.
It is absolutely the same with our chain of micro-operations. It will have to make at least one replay round, and maybe even more depending on how long it takes to get the data ready.
And this process can be over only when the chain of dependent micro-operations is over. So, it looks like a single “hole” can result into an “endless” replay, unless the entire chain of micro-operations is completed. Or unless some other “magical” system steps in.