We paid special attention to them, because they are really hard to foresee in the program code, unlike the data size change. During the tests we carried out for the STLF violations, we discovered that they can cause re-execution within the RL-7 as well as within the RL-12, depending on the type of violation. When STD is not ready, RL-7 starts working, and when STA is not ready – RL-12. In fact, this is fairly simple: when we are executing the Load command, we search for the data in the Store Buffer; once a position with the coinciding address is found, the possible data absence is detected immediately. However, if the data is there but the address hasn’t arrived yet, the CPU can only hope for the best and assume that this Store will not affect the Load. Later on it will perform a Check and in case of failure the operation will be sent to RL-12.
In order to ensure that LD is executed correctly from the very beginning, it should be sent for execution not any earlier than STA and STD. note that it should be sent at least 3 clocks later than STA. The processor will not allow Load to be executed ahead of STA, however, 3 clock cycles is a pretty big gap, so the Load queue scheduler will try to take advantage of it, because it doesn’t know anything about the hidden dependence of the addresses on the Store.
Pentium 4 architecture provides excellent opportunities for preliminary date loading: this is where the address operation queues come in handy. Therefore, you cannot guarantee that STLF violations will be avoided even if you try to adjust the algorithm implementation. Any pair of dependent Store-Load operations in the “window” of a few dozens of instructions is a potential hazard.
One of the most illustrative examples here could be the function calls we see in all program codes. The function call is usually performed by a chain of stack parameter store commands – PUSH – and the actual CALL command.
Then the registers are saved and the parameters are read within the function call.
MOV EAX, [ESP+8]
You can notice that the PUSH EAX and MOV EAX, [ESP+8] pair is a potential cause of replay in case we have STLF Restriction on Data Availability: when STD is not ready (the procedure receives a result of long calculations) as well as when STA is not ready (if the scheduler sends MOV EAX,[ESP+8] for execution less than 3 clocks after PUSH EAX).